Search | arXiv e-print repository

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic

Authors: Sachin Goyal, Pratyush Maini, Zachary C. Lipton, Aditi Raghunathan, J. Zico Kolter

Abstract: Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets. In recent times, data curation has gained prominence with several works developing strategies to retain 'high-quality' subsets of 'raw' scraped data. For instance, the LAION public dataset retained only 10% of the total crawled data. However, these strategies are typically developed agnostic of… ▽ More Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets. In recent times, data curation has gained prominence with several works developing strategies to retain 'high-quality' subsets of 'raw' scraped data. For instance, the LAION public dataset retained only 10% of the total crawled data. However, these strategies are typically developed agnostic of the available compute for training. In this paper, we first demonstrate that making filtering decisions independent of training compute is often suboptimal: the limited high-quality data rapidly loses its utility when repeated, eventually requiring the inclusion of 'unseen' but 'lower-quality' data. To address this quality-quantity tradeoff ($\texttt{QQT}$), we introduce neural scaling laws that account for the non-homogeneous nature of web data, an angle ignored in existing literature. Our scaling laws (i) characterize the $\textit{differing}$ 'utility' of various quality subsets of web data; (ii) account for how utility diminishes for a data point at its 'nth' repetition; and (iii) formulate the mutual interaction of various data pools when combined, enabling the estimation of model performance on a combination of multiple data pools without ever jointly training on them. Our key message is that data curation $\textit{cannot}$ be agnostic of the total compute that a model will be trained for. Our scaling laws allow us to curate the best possible pool for achieving top performance on Datacomp at various compute budgets, carving out a pareto-frontier for data curation. Code is available at https://github.com/locuslab/scaling_laws_data_filtering. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Published at CVPR 2024

arXiv:2403.09057 [pdf, other]

A Continued Pretrained LLM Approach for Automatic Medical Note Generation

Authors: Dong Yuan, Eti Rastogi, Gautam Naik, Sree Prasanna Rajagopal, Sagar Goyal, Fen Zhao, Bharath Chintagunta, Jeff Ward

Abstract: LLMs are revolutionizing NLP tasks. However, the use of the most advanced LLMs, such as GPT-4, is often prohibitively expensive for most specialized fields. We introduce HEAL, the first continuously trained 13B LLaMA2-based LLM that is purpose-built for medical conversations and measured on automated scribing. Our results demonstrate that HEAL outperforms GPT-4 and PMC-LLaMA in PubMedQA, with an a… ▽ More LLMs are revolutionizing NLP tasks. However, the use of the most advanced LLMs, such as GPT-4, is often prohibitively expensive for most specialized fields. We introduce HEAL, the first continuously trained 13B LLaMA2-based LLM that is purpose-built for medical conversations and measured on automated scribing. Our results demonstrate that HEAL outperforms GPT-4 and PMC-LLaMA in PubMedQA, with an accuracy of 78.4\%. It also achieves parity with GPT-4 in generating medical notes. Remarkably, HEAL surpasses GPT-4 and Med-PaLM 2 in identifying more correct medical concepts and exceeds the performance of human scribes and other comparable models in correctness and completeness. △ Less

Submitted 3 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted to NAACL 2024

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.00826 [pdf, other]

LLMGuard: Guarding Against Unsafe LLM Behavior

Authors: Shubh Goyal, Medha Hira, Shubham Mishra, Sukriti Goyal, Arnav Goel, Niharika Dadu, Kirushikesh DB, Sameep Mehta, Nishtha Madaan

Abstract: Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content aga… ▽ More Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLMGuard employs an ensemble of detectors. △ Less

Submitted 27 February, 2024; originally announced March 2024.

Comments: accepted in demonstration track of AAAI-24

arXiv:2402.10567 [pdf, other]

InSaAF: Incorporating Safety through Accuracy and Fairness | Are LLMs ready for the Indian Legal Domain?

Authors: Yogesh Tripathi, Raghav Donakanti, Sahil Girhepuje, Ishan Kavathekar, Bhaskara Hanuma Vedula, Gokul S Krishnan, Shreya Goyal, Anmol Goel, Balaraman Ravindran, Ponnurangam Kumaraguru

Abstract: Recent advancements in language technology and Artificial Intelligence have resulted in numerous Language Models being proposed to perform various tasks in the legal domain ranging from predicting judgments to generating summaries. Despite their immense potential, these models have been proven to learn and exhibit societal biases and make unfair predictions. In this study, we explore the ability o… ▽ More Recent advancements in language technology and Artificial Intelligence have resulted in numerous Language Models being proposed to perform various tasks in the legal domain ranging from predicting judgments to generating summaries. Despite their immense potential, these models have been proven to learn and exhibit societal biases and make unfair predictions. In this study, we explore the ability of Large Language Models (LLMs) to perform legal tasks in the Indian landscape when social factors are involved. We present a novel metric, $β$-weighted $\textit{Legal Safety Score ($LSS_β$)}$, which encapsulates both the fairness and accuracy aspects of the LLM. We assess LLMs' safety by considering its performance in the $\textit{Binary Statutory Reasoning}$ task and its fairness exhibition with respect to various axes of disparities in the Indian society. Task performance and fairness scores of LLaMA and LLaMA--2 models indicate that the proposed $LSS_β$ metric can effectively determine the readiness of a model for safe usage in the legal sector. We also propose finetuning pipelines, utilising specialised legal datasets, as a potential method to mitigate bias and improve model safety. The finetuning procedures on LLaMA and LLaMA--2 models increase the $LSS_β$, improving their usability in the Indian legal domain. Our code is publicly released. △ Less

Submitted 17 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.09784 [pdf, other]

doi 10.4204/EPTCS.395.8

Automatic Generation of Scenarios for System-level Simulation-based Verification of Autonomous Driving Systems

Authors: Srajan Goyal, Alberto Griggio, Jacob Kimblad, Stefano Tonetta

Abstract: With increasing complexity of Automated Driving Systems (ADS), ensuring their safety and reliability has become a critical challenge. The Verification and Validation (V&V) of these systems are particularly demanding when AI components are employed to implement perception and/or control functions. In ESA-funded project VIVAS, we developed a generic framework for system-level simulation-based V&V of… ▽ More With increasing complexity of Automated Driving Systems (ADS), ensuring their safety and reliability has become a critical challenge. The Verification and Validation (V&V) of these systems are particularly demanding when AI components are employed to implement perception and/or control functions. In ESA-funded project VIVAS, we developed a generic framework for system-level simulation-based V&V of autonomous systems. The approach is based on a simulation model of the system, an abstract model that describes symbolically the system behavior, and formal methods to generate scenarios and verify the simulation executions. Various coverage criteria can be defined to guide the automated generation of the scenarios. In this paper, we describe the instantiation of the VIVAS framework for an ADS case study. This is based on the integration of CARLA, a widely-used driving simulator, and its ScenarioRunner tool, which enables the creation of diverse and complex driving scenarios. This is also used in the CARLA Autonomous Driving Challenge to validate different ADS agents for perception and control based on AI, shared by the CARLA community. We describe the development of an abstract ADS model and the formulation of a coverage criterion that focuses on the behaviors of vehicles relative to the vehicle with ADS under verification. Leveraging the VIVAS framework, we generate and execute various driving scenarios, thus testing the capabilities of the AI components. The results show the effectiveness of VIVAS in automatically generating scenarios for system-level simulation-based V&V of an automated driving system using CARLA and ScenarioRunner. Therefore, they highlight the potential of the approach as a powerful tool in the future of ADS V&V methodologies. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: In Proceedings FMAS 2023, arXiv:2311.08987

Journal ref: EPTCS 395, 2023, pp. 113-129

arXiv:2310.10294 [pdf, other]

Key-phrase boosted unsupervised summary generation for FinTech organization

Authors: Aadit Deshpande, Shreya Goyal, Prateek Nagwanshi, Avinash Tripathy

Abstract: With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can he… ▽ More With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can help FinTech organizations to utilize the social media language data to find useful external insights and can be further utilized for downstream NLP tasks. Particularly, a summary which highlights the intents and sentiments of the users can be very useful for these organizations to get an external perspective. This external perspective can help organizations to better manage their products, offers, promotional campaigns, etc. However, certain challenges, such as a lack of labeled domain-specific datasets impede further exploration of these tasks in the FinTech domain. To overcome these challenges, we design an unsupervised phrase-based summary generation from social media data, using 'Action-Object' pairs (intent phrases). We evaluated the proposed method with other key-phrase based summary generation methods in the direction of contextual information of various Reddit discussion threads, available in the different summaries. We introduce certain "Context Metrics" such as the number of Unique words, Action-Object pairs, and Noun chunks to evaluate the contextual information retrieved from the source text in these phrase-based summaries. We demonstrate that our methods significantly outperform the baseline on these metrics, thus providing a qualitative and quantitative measure of their efficacy. Proposed framework has been leveraged as a web utility portal hosted within Amex. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 8 pages, 4 figures

arXiv:2310.02372 [pdf, other]

ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks

Authors: Ritesh Kumar, Saurabh Goyal, Ashish Verma, Vatche Isahagian

Abstract: Key value pair (KVP) extraction or Named Entity Recognition(NER) from visually rich documents has been an active area of research in document understanding and data extraction domain. Several transformer based models such as LayoutLMv2, LayoutLMv3, and LiLT have emerged achieving state of the art results. However, addition of even a single new class to the existing model requires (a) re-annotation… ▽ More Key value pair (KVP) extraction or Named Entity Recognition(NER) from visually rich documents has been an active area of research in document understanding and data extraction domain. Several transformer based models such as LayoutLMv2, LayoutLMv3, and LiLT have emerged achieving state of the art results. However, addition of even a single new class to the existing model requires (a) re-annotation of entire training dataset to include this new class and (b) retraining the model again. Both of these issues really slow down the deployment of updated model. \\ We present \textbf{ProtoNER}: Prototypical Network based end-to-end KVP extraction model that allows addition of new classes to an existing model while requiring minimal number of newly annotated training samples. The key contributions of our model are: (1) No dependency on dataset used for initial training of the model, which alleviates the need to retain original training dataset for longer duration as well as data re-annotation which is very time consuming task, (2) No intermediate synthetic data generation which tends to add noise and results in model's performance degradation, and (3) Hybrid loss function which allows model to retain knowledge about older classes as well as learn about newly added classes.\\ Experimental results show that ProtoNER finetuned with just 30 samples is able to achieve similar results for the newly added classes as that of regular model finetuned with 2600 samples. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2310.02226 [pdf, other]

Think before you speak: Training Language Models With Pause Tokens

Authors: Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

Abstract: Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on lan… ▽ More Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token? We operationalize this idea by performing training and inference on language models with a (learnable) $\textit{pause}$ token, a sequence of which is appended to the input prefix. We then delay extracting the model's outputs until the last pause token is seen, thereby allowing the model to process extra computation before committing to an answer. We empirically evaluate $\textit{pause-training}$ on decoder-only models of 1B and 130M parameters with causal pretraining on C4, and on downstream tasks covering reasoning, question-answering, general understanding and fact recall. Our main finding is that inference-time delays show gains when the model is both pre-trained and finetuned with delays. For the 1B model, we witness gains on 8 of 9 tasks, most prominently, a gain of $18\%$ EM score on the QA task of SQuAD, $8\%$ on CommonSenseQA and $1\%$ accuracy on the reasoning task of GSM8k. Our work raises a range of conceptual and practical future research questions on making delayed next-token prediction a widely applicable new paradigm. △ Less

Submitted 20 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Published at ICLR 2024

arXiv:2308.13204 [pdf, other]

Self-supervised learning for hotspot detection and isolation from thermal images

Authors: Shreyas Goyal, Jagath C. Rajapakse

Abstract: Hotspot detection using thermal imaging has recently become essential in several industrial applications, such as security applications, health applications, and equipment monitoring applications. Hotspot detection is of utmost importance in industrial safety where equipment can develop anomalies. Hotspots are early indicators of such anomalies. We address the problem of hotspot detection in therm… ▽ More Hotspot detection using thermal imaging has recently become essential in several industrial applications, such as security applications, health applications, and equipment monitoring applications. Hotspot detection is of utmost importance in industrial safety where equipment can develop anomalies. Hotspots are early indicators of such anomalies. We address the problem of hotspot detection in thermal images by proposing a self-supervised learning approach. Self-supervised learning has shown potential as a competitive alternative to their supervised learning counterparts but their application to thermography has been limited. This has been due to lack of diverse data availability, domain specific pre-trained models, standardized benchmarks, etc. We propose a self-supervised representation learning approach followed by fine-tuning that improves detection of hotspots by classification. The SimSiam network based ensemble classifier decides whether an image contains hotspots or not. Detection of hotspots is followed by precise hotspot isolation. By doing so, we are able to provide a highly accurate and precise hotspot identification, applicable to a wide range of applications. We created a novel large thermal image dataset to address the issue of paucity of easily accessible thermal images. Our experiments with the dataset created by us and a publicly available segmentation dataset show the potential of our approach for hotspot detection and its ability to isolate hotspots with high accuracy. We achieve a Dice Coefficient of 0.736, the highest when compared with existing hotspot identification techniques. Our experiments also show self-supervised learning as a strong contender of supervised learning, providing competitive metrics for hotspot detection, with the highest accuracy of our approach being 97%. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2307.03132 [pdf, other]

T-MARS: Improving Visual Representations by Circumventing Text Feature Learning

Authors: Pratyush Maini, Sachin Goyal, Zachary C. Lipton, J. Zico Kolter, Aditi Raghunathan

Abstract: Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only im… ▽ More Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only image-caption pairs whose CLIP similarity score exceeded a designated threshold. In this paper, we propose a new state-of-the-art data filtering approach motivated by our observation that nearly 40% of LAION's images contain text that overlaps significantly with the caption. Intuitively, such data could be wasteful as it incentivizes models to perform optical character recognition rather than learning visual features. However, naively removing all such data could also be wasteful, as it throws away images that contain visual features (in addition to overlapping text). Our simple and scalable approach, T-MARS (Text Masking and Re-Scoring), filters out only those pairs where the text dominates the remaining visual features -- by first masking out the text and then filtering out those with a low CLIP similarity score of the masked image. Experimentally, T-MARS outperforms the top-ranked method on the "medium scale" of DataComp (a data filtering benchmark) by a margin of 6.5% on ImageNet and 4.7% on VTAB. Additionally, our systematic evaluation on various data pool sizes from 2M to 64M shows that the accuracy gains enjoyed by T-MARS linearly increase as data and compute are scaled exponentially. Code is available at https://github.com/locuslab/T-MARS. △ Less

Submitted 18 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Accepted to ICLR 2024. Oral at ICCV Datacomp 2023

arXiv:2306.14939 [pdf, other]

The Art of Embedding Fusion: Optimizing Hate Speech Detection

Authors: Mohammad Aflah Khan, Neemesh Yadav, Mohit Jain, Sanyam Goyal

Abstract: Hate speech detection is a challenging natural language processing task that requires capturing linguistic and contextual nuances. Pre-trained language models (PLMs) offer rich semantic representations of text that can improve this task. However there is still limited knowledge about ways to effectively combine representations across PLMs and leverage their complementary strengths. In this work, w… ▽ More Hate speech detection is a challenging natural language processing task that requires capturing linguistic and contextual nuances. Pre-trained language models (PLMs) offer rich semantic representations of text that can improve this task. However there is still limited knowledge about ways to effectively combine representations across PLMs and leverage their complementary strengths. In this work, we shed light on various combination techniques for several PLMs and comprehensively analyze their effectiveness. Our findings show that combining embeddings leads to slight improvements but at a high computational cost and the choice of combination has marginal effect on the final outcome. We also make our codebase public at https://github.com/aflah02/The-Art-of-Embedding-Fusion-Optimizing-Hate-Speech-Detection . △ Less

Submitted 8 October, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: Published as a Tiny Paper at ICLR 2023, 12 Pages

arXiv:2306.13841 [pdf, other]

Is Pre-training Truly Better Than Meta-Learning?

Authors: Brando Miranda, Patrick Yu, Saumya Goyal, Yu-Xiong Wang, Sanmi Koyejo

Abstract: In the context of few-shot learning, it is currently believed that a fixed pre-trained (PT) model, along with fine-tuning the final layer during evaluation, outperforms standard meta-learning algorithms. We re-evaluate these claims under an in-depth empirical examination of an extensive set of formally diverse datasets and compare PT to Model Agnostic Meta-Learning (MAML). Unlike previous work, we… ▽ More In the context of few-shot learning, it is currently believed that a fixed pre-trained (PT) model, along with fine-tuning the final layer during evaluation, outperforms standard meta-learning algorithms. We re-evaluate these claims under an in-depth empirical examination of an extensive set of formally diverse datasets and compare PT to Model Agnostic Meta-Learning (MAML). Unlike previous work, we emphasize a fair comparison by using: the same architecture, the same optimizer, and all models trained to convergence. Crucially, we use a more rigorous statistical tool -- the effect size (Cohen's d) -- to determine the practical significance of the difference between a model trained with PT vs. a MAML. We then use a previously proposed metric -- the diversity coefficient -- to compute the average formal diversity of a dataset. Using this analysis, we demonstrate the following: 1. when the formal diversity of a data set is low, PT beats MAML on average and 2. when the formal diversity is high, MAML beats PT on average. The caveat is that the magnitude of the average difference between a PT vs. MAML using the effect size is low (according to classical statistical thresholds) -- less than 0.2. Nevertheless, this observation is contrary to the currently held belief that a pre-trained model is always better than a meta-learning model. Our extensive experiments consider 21 few-shot learning benchmarks, including the large-scale few-shot learning dataset Meta-Data set. We also show no significant difference between a MAML model vs. a PT model with GPT-2 on Openwebtext. We, therefore, conclude that a pre-trained model does not always beat a meta-learned model and that the formal diversity of a dataset is a driving factor. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Journal ref: Proceedings of the 40th International Conference on Machine Learning 2023 DMLR Workshop

arXiv:2303.11761 [pdf, other]

Reasonable Scale Machine Learning with Open-Source Metaflow

Authors: Jacopo Tagliabue, Hugo Bowne-Anderson, Ville Tuulos, Savin Goyal, Romain Cledat, David Berg

Abstract: As Machine Learning (ML) gains adoption across industries and new use cases, practitioners increasingly realize the challenges around effectively developing and iterating on ML systems: reproducibility, debugging, scalability, and documentation are elusive goals for real-world pipelines outside tech-first companies. In this paper, we review the nature of ML-oriented workloads and argue that re-pur… ▽ More As Machine Learning (ML) gains adoption across industries and new use cases, practitioners increasingly realize the challenges around effectively developing and iterating on ML systems: reproducibility, debugging, scalability, and documentation are elusive goals for real-world pipelines outside tech-first companies. In this paper, we review the nature of ML-oriented workloads and argue that re-purposing existing tools won't solve the current productivity issues, as ML peculiarities warrant specialized development tooling. We then introduce Metaflow, an open-source framework for ML projects explicitly designed to boost the productivity of data practitioners by abstracting away the execution of ML code from the definition of the business logic. We show how our design addresses the main challenges in ML operations (MLOps), and document through examples, interviews and use cases its practical impact on the field. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.11548 [pdf, other]

Emotionally Enhanced Talking Face Generation

Authors: Sahil Goyal, Shagun Uppal, Sarthak Bhagat, Yi Yu, Yifang Yin, Rajiv Ratn Shah

Abstract: Several works have developed end-to-end pipelines for generating lip-synced talking faces with various real-world applications, such as teaching and language translation in videos. However, these prior works fail to create realistic-looking videos since they focus little on people's expressions and emotions. Moreover, these methods' effectiveness largely depends on the faces in the training datase… ▽ More Several works have developed end-to-end pipelines for generating lip-synced talking faces with various real-world applications, such as teaching and language translation in videos. However, these prior works fail to create realistic-looking videos since they focus little on people's expressions and emotions. Moreover, these methods' effectiveness largely depends on the faces in the training dataset, which means they may not perform well on unseen faces. To mitigate this, we build a talking face generation framework conditioned on a categorical emotion to generate videos with appropriate expressions, making them more realistic and convincing. With a broad range of six emotions, i.e., \emph{happiness}, \emph{sadness}, \emph{fear}, \emph{anger}, \emph{disgust}, and \emph{neutral}, we show that our model can adapt to arbitrary identities, emotions, and languages. Our proposed framework is equipped with a user-friendly web interface with a real-time experience for talking face generation with emotions. We also conduct a user study for subjective evaluation of our interface's usability, design, and functionality. Project page: https://midas.iiitd.edu.in/emo/ △ Less

Submitted 26 March, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

arXiv:2303.07247 [pdf]

Are Models Trained on Indian Legal Data Fair?

Authors: Sahil Girhepuje, Anmol Goel, Gokul S Krishnan, Shreya Goyal, Satyendra Pandey, Ponnurangam Kumaraguru, Balaraman Ravindran

Abstract: Recent advances and applications of language technology and artificial intelligence have enabled much success across multiple domains like law, medical and mental health. AI-based Language Models, like Judgement Prediction, have recently been proposed for the legal sector. However, these models are strife with encoded social biases picked up from the training data. While bias and fairness have bee… ▽ More Recent advances and applications of language technology and artificial intelligence have enabled much success across multiple domains like law, medical and mental health. AI-based Language Models, like Judgement Prediction, have recently been proposed for the legal sector. However, these models are strife with encoded social biases picked up from the training data. While bias and fairness have been studied across NLP, most studies primarily locate themselves within a Western context. In this work, we present an initial investigation of fairness from the Indian perspective in the legal domain. We highlight the propagation of learnt algorithmic biases in the bail prediction task for models trained on Hindi legal documents. We evaluate the fairness gap using demographic parity and show that a decision tree model trained for the bail prediction task has an overall fairness disparity of 0.237 between input features associated with Hindus and Muslims. Additionally, we highlight the need for further research and studies in the avenues of fairness/bias in applying AI in the legal sector with a specific focus on the Indian context. △ Less

Submitted 14 May, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Presented at the Symposium on AI and Law (SAIL) 2023

arXiv:2302.08624 [pdf, other]

InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis

Authors: Kevin Scaria, Himanshu Gupta, Siddharth Goyal, Saurabh Arjun Sawant, Swaroop Mishra, Chitta Baral

Abstract: We introduce InstructABSA, an instruction learning paradigm for Aspect-Based Sentiment Analysis (ABSA) subtasks. Our method introduces positive, negative, and neutral examples to each training sample, and instruction tune the model (Tk-Instruct) for ABSA subtasks, yielding significant performance improvements. Experimental results on the Sem Eval 2014, 15, and 16 datasets demonstrate that Instruct… ▽ More We introduce InstructABSA, an instruction learning paradigm for Aspect-Based Sentiment Analysis (ABSA) subtasks. Our method introduces positive, negative, and neutral examples to each training sample, and instruction tune the model (Tk-Instruct) for ABSA subtasks, yielding significant performance improvements. Experimental results on the Sem Eval 2014, 15, and 16 datasets demonstrate that InstructABSA outperforms the previous state-of-the-art (SOTA) approaches on Term Extraction (ATE), Sentiment Classification(ATSC) and Sentiment Pair Extraction (ASPE) subtasks. In particular, InstructABSA outperforms the previous state-of-the-art (SOTA) on the Rest14 ATE subtask by 5.69% points, the Rest15 ATSC subtask by 9.59% points, and the Lapt14 AOPE subtask by 3.37% points, surpassing 7x larger models. We also get competitive results on AOOE, AOPE, and AOSTE subtasks indicating strong generalization ability to all subtasks. Exploring sample efficiency reveals that just 50% train data is required to get competitive results with other instruction tuning approaches. Lastly, we assess the quality of instructions and observe that InstructABSA's performance experiences a decline of ~10% when adding misleading examples. △ Less

Submitted 13 November, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: 4 pages, 3 figures, 9 tables, 9 appendix pages

arXiv:2301.05150 [pdf, other]

Unsupervised Question Duplicate and Related Questions Detection in e-learning platforms

Authors: Maksimjeet Chowdhary, Sanyam Goyal, Venktesh V, Mukesh Mohania, Vikram Goyal

Abstract: Online learning platforms provide diverse questions to gauge the learners' understanding of different concepts. The repository of questions has to be constantly updated to ensure a diverse pool of questions to conduct assessments for learners. However, it is impossible for the academician to manually skim through the large repository of questions to check for duplicates when onboarding new questio… ▽ More Online learning platforms provide diverse questions to gauge the learners' understanding of different concepts. The repository of questions has to be constantly updated to ensure a diverse pool of questions to conduct assessments for learners. However, it is impossible for the academician to manually skim through the large repository of questions to check for duplicates when onboarding new questions from external sources. Hence, we propose a tool QDup in this paper that can surface near-duplicate and semantically related questions without any supervised data. The proposed tool follows an unsupervised hybrid pipeline of statistical and neural approaches for incorporating different nuances in similarity for the task of question duplicate detection. We demonstrate that QDup can detect near-duplicate questions and also suggest related questions for practice with remarkable accuracy and speed from a large repository of questions. The demo video of the tool can be found at https://www.youtube.com/watch?v=loh0_-7XLW4. △ Less

Submitted 20 December, 2022; originally announced January 2023.

Comments: 4 pages accepted as demo paper at WSDM 2023

arXiv:2212.05409 [pdf, other]

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages

Authors: Sumanth Doddapaneni, Rahul Aralikatte, Gowtham Ramesh, Shreya Goyal, Mitesh M. Khapra, Anoop Kunchukuttan, Pratyush Kumar

Abstract: Building Natural Language Understanding (NLU) capabilities for Indic languages, which have a collective speaker base of more than one billion speakers is absolutely crucial. In this work, we aim to improve the NLU capabilities of Indic languages by making contributions along 3 important axes (i) monolingual corpora (ii) NLU testsets (iii) multilingual LLMs focusing on Indic languages. Specifically… ▽ More Building Natural Language Understanding (NLU) capabilities for Indic languages, which have a collective speaker base of more than one billion speakers is absolutely crucial. In this work, we aim to improve the NLU capabilities of Indic languages by making contributions along 3 important axes (i) monolingual corpora (ii) NLU testsets (iii) multilingual LLMs focusing on Indic languages. Specifically, we curate the largest monolingual corpora, IndicCorp, with 20.9B tokens covering 24 languages from 4 language families - a 2.3x increase over prior work, while supporting 12 additional languages. Next, we create a human-supervised benchmark, IndicXTREME, consisting of nine diverse NLU tasks covering 20 languages. Across languages and tasks, IndicXTREME contains a total of 105 evaluation sets, of which 52 are new contributions to the literature. To the best of our knowledge, this is the first effort towards creating a standard benchmark for Indic languages that aims to test the multilingual zero-shot capabilities of pretrained language models. Finally, we train IndicBERT v2, a state-of-the-art model supporting all the languages. Averaged across languages and tasks, the model achieves an absolute improvement of 2 points over a strong baseline. The data and models are available at https://github.com/AI4Bharat/IndicBERT. △ Less

Submitted 24 May, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

Comments: ACL 2023

arXiv:2212.00638 [pdf, other]

Finetune like you pretrain: Improved finetuning of zero-shot vision models

Authors: Sachin Goyal, Ananya Kumar, Sankalp Garg, Zico Kolter, Aditi Raghunathan

Abstract: Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety of benchmarks. However, recent works like WiseFT (Wortsman et al., 2021) and LP-FT (Kumar et al., 2022) have shown that even subtle differences in the finetuning process can lead to surprisingly large differences in the final performance, both for in-distribution (ID) and out-of-distribution (OOD) data. In… ▽ More Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety of benchmarks. However, recent works like WiseFT (Wortsman et al., 2021) and LP-FT (Kumar et al., 2022) have shown that even subtle differences in the finetuning process can lead to surprisingly large differences in the final performance, both for in-distribution (ID) and out-of-distribution (OOD) data. In this work, we show that a natural and simple approach of mimicking contrastive pretraining consistently outperforms alternative finetuning approaches. Specifically, we cast downstream class labels as text prompts and continue optimizing the contrastive loss between image embeddings and class-descriptive prompt embeddings (contrastive finetuning). Our method consistently outperforms baselines across 7 distribution shifts, 6 transfer learning, and 3 few-shot learning benchmarks. On WILDS-iWILDCam, our proposed approach FLYP outperforms the top of the leaderboard by $2.3\%$ ID and $2.7\%$ OOD, giving the highest reported accuracy. Averaged across 7 OOD datasets (2 WILDS and 5 ImageNet associated shifts), FLYP gives gains of $4.2\%$ OOD over standard finetuning and outperforms the current state of the art (LP-FT) by more than $1\%$ both ID and OOD. Similarly, on 3 few-shot learning benchmarks, our approach gives gains up to $4.6\%$ over standard finetuning and $4.4\%$ over the state of the art. In total, these benchmarks establish contrastive finetuning as a simple, intuitive, and state-of-the-art approach for supervised finetuning of image-text models like CLIP. Code is available at https://github.com/locuslab/FLYP. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: 20 Pages, 7 Tables, 5 Figures

arXiv:2211.09174 [pdf, other]

CASPR: Customer Activity Sequence-based Prediction and Representation

Authors: Pin-Jung Chen, Sahil Bhatnagar, Sagar Goyal, Damian Konrad Kowalczyk, Mayank Shrivastava

Abstract: Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning pr… ▽ More Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning present an opportunity to simplify and generalize feature engineering across applications. When applying these advancements to tabular data researchers deal with data heterogeneity, variations in customer engagement history or the sheer volume of enterprise datasets. In this paper, we propose a novel approach to encode tabular data containing customer transactions, purchase history and other interactions into a generic representation of a customer's association with the business. We then evaluate these embeddings as features to train multiple models spanning a variety of applications. CASPR, Customer Activity Sequence-based Prediction and Representation, applies Transformer architecture to encode activity sequences to improve model performance and avoid bespoke feature engineering across applications. Our experiments at scale validate CASPR for both small and large enterprise applications. △ Less

Submitted 28 November, 2022; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Presented at the Table Representation Learning Workshop, NeurIPS 2022, New Orleans. Authors listed in random order

arXiv:2210.07471 [pdf, other]

"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility

Authors: Himanshu Gupta, Neeraj Varshney, Swaroop Mishra, Kuntal Kumar Pal, Saurabh Arjun Sawant, Kevin Scaria, Siddharth Goyal, Chitta Baral

Abstract: In current NLP research, large-scale language models and their abilities are widely being discussed. Some recent works have also found notable failures of these models. Often these failure examples involve complex reasoning abilities. This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible. To this end, we introduce FeasibilityQA, a question-an… ▽ More In current NLP research, large-scale language models and their abilities are widely being discussed. Some recent works have also found notable failures of these models. Often these failure examples involve complex reasoning abilities. This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible. To this end, we introduce FeasibilityQA, a question-answering dataset involving binary classification (BCQ) and multi-choice multi-correct questions (MCQ) that test understanding of feasibility. We show that even state-of-the-art models such as GPT-3, GPT-2, and T5 struggle to answer the feasibility questions correctly. Specifically, on MCQ and BCQ questions, GPT-3 achieves an accuracy of just (19%, 62%) and (25%, 64%) in zero-shot and few-shot settings, respectively. We also evaluate models by providing relevant knowledge statements required to answer the question. We find that the additional knowledge leads to a 7% gain in performance, but the overall performance still remains low. These results make one wonder how much commonsense knowledge about action feasibility is encoded in state-of-the-art models and how well they can reason about it. △ Less

Submitted 2 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: EACL 2023

arXiv:2207.09640 [pdf, other]

Test-Time Adaptation via Conjugate Pseudo-labels

Authors: Sachin Goyal, Mingjie Sun, Aditi Raghunathan, Zico Kolter

Abstract: Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts, with access to only the unlabeled test samples from the new domain at test-time. Prior TTA methods optimize over unsupervised objectives such as the entropy of model predictions in TENT [Wang et al., 2021], but it is unclear what exactly makes a good TTA loss. In this paper, we start by presenting a surprising ph… ▽ More Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts, with access to only the unlabeled test samples from the new domain at test-time. Prior TTA methods optimize over unsupervised objectives such as the entropy of model predictions in TENT [Wang et al., 2021], but it is unclear what exactly makes a good TTA loss. In this paper, we start by presenting a surprising phenomenon: if we attempt to meta-learn the best possible TTA loss over a wide class of functions, then we recover a function that is remarkably similar to (a temperature-scaled version of) the softmax-entropy employed by TENT. This only holds, however, if the classifier we are adapting is trained via cross-entropy; if trained via squared loss, a different best TTA loss emerges. To explain this phenomenon, we analyze TTA through the lens of the training losses's convex conjugate. We show that under natural conditions, this (unsupervised) conjugate function can be viewed as a good local approximation to the original supervised loss and indeed, it recovers the best losses found by meta-learning. This leads to a generic recipe that can be used to find a good TTA loss for any given supervised training loss function of a general class. Empirically, our approach consistently dominates other baselines over a wide range of benchmarks. Our approach is particularly of interest when applied to classifiers trained with novel loss functions, e.g., the recently-proposed PolyLoss, where it differs substantially from (and outperforms) an entropy-based loss. Further, we show that our approach can also be interpreted as a kind of self-training using a very specific soft label, which we refer to as the conjugate pseudolabel. Overall, our method provides a broad framework for better understanding and improving test-time adaptation. Code is available at https://github.com/locuslab/tta_conjugate. △ Less

Submitted 22 November, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: Published in Neural Information Processing Systems (NeurIPS) 2022

arXiv:2206.08917 [pdf, other]

doi 10.1021/acscatal.2c05426

The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts

Authors: Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M. Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Felix Therrien, Jehad Abed, Oleksandr Voznyy, Edward H. Sargent, Zachary Ulissi, C. Lawrence Zitnick

Abstract: The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single p… ▽ More The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single point calculations) across a range of oxide materials, coverages, and adsorbates. We define generalized total energy tasks that enable property prediction beyond adsorption energies; we test baseline performance of several graph neural networks; and we provide pre-defined dataset splits to establish clear benchmarks for future efforts. In the most general task, GemNet-OC sees a ~36% improvement in energy predictions when combining the chemically dissimilar OC20 and OC22 datasets via fine-tuning. Similarly, we achieved a ~19% improvement in total energy predictions on OC20 and a ~9% improvement in force predictions in OC22 when using joint training. We demonstrate the practical utility of a top performing model by capturing literature adsorption energies and important OER scaling relationships. We expect OC22 to provide an important benchmark for models seeking to incorporate intricate long-range electrostatic and magnetic interactions in oxide surfaces. Dataset and baseline models are open sourced, and a public leaderboard is available to encourage continued community developments on the total energy tasks and data. △ Less

Submitted 7 March, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: 50 pages, 14 figures

arXiv:2206.08564 [pdf, other]

MET: Masked Encoding for Tabular Data

Authors: Kushal Majmundar, Sachin Goyal, Praneeth Netrapalli, Prateek Jain

Abstract: We consider the task of self-supervised representation learning (SSL) for tabular data: tabular-SSL. Typical contrastive learning based SSL methods require instance-wise data augmentations which are difficult to design for unstructured tabular data. Existing tabular-SSL methods design such augmentations in a relatively ad-hoc fashion and can fail to capture the underlying data manifold. Instead of… ▽ More We consider the task of self-supervised representation learning (SSL) for tabular data: tabular-SSL. Typical contrastive learning based SSL methods require instance-wise data augmentations which are difficult to design for unstructured tabular data. Existing tabular-SSL methods design such augmentations in a relatively ad-hoc fashion and can fail to capture the underlying data manifold. Instead of augmentations based approaches for tabular-SSL, we propose a new reconstruction based method, called Masked Encoding for Tabular Data (MET), that does not require augmentations. MET is based on the popular MAE approach for vision-SSL [He et al., 2021] and uses two key ideas: (i) since each coordinate in a tabular dataset has a distinct meaning, we need to use separate representations for all coordinates, and (ii) using an adversarial reconstruction loss in addition to the standard one. Empirical results on five diverse tabular datasets show that MET achieves a new state of the art (SOTA) on all of these datasets and improves up to 9% over current SOTA methods. We shed more light on the working of MET via experiments on carefully designed simple datasets. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Under Review, 18 pages, 6 Tables, 4 Figures

arXiv:2203.09697 [pdf, other]

Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations

Authors: Anuroop Sriram, Abhishek Das, Brandon M. Wood, Siddharth Goyal, C. Lawrence Zitnick

Abstract: Recent progress in Graph Neural Networks (GNNs) for modeling atomic simulations has the potential to revolutionize catalyst discovery, which is a key step in making progress towards the energy breakthroughs needed to combat climate change. However, the GNNs that have proven most effective for this task are memory intensive as they model higher-order interactions in the graphs such as those between… ▽ More Recent progress in Graph Neural Networks (GNNs) for modeling atomic simulations has the potential to revolutionize catalyst discovery, which is a key step in making progress towards the energy breakthroughs needed to combat climate change. However, the GNNs that have proven most effective for this task are memory intensive as they model higher-order interactions in the graphs such as those between triplets or quadruplets of atoms, making it challenging to scale these models. In this paper, we introduce Graph Parallelism, a method to distribute input graphs across multiple GPUs, enabling us to train very large GNNs with hundreds of millions or billions of parameters. We empirically evaluate our method by scaling up the number of parameters of the recently proposed DimeNet++ and GemNet models by over an order of magnitude. On the large-scale Open Catalyst 2020 (OC20) dataset, these graph-parallelized models lead to relative improvements of 1) 15% on the force MAE metric for the S2EF task and 2) 21% on the AFbT metric for the IS2RS task, establishing new state-of-the-art results. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: ICLR 2022

arXiv:2203.06414 [pdf, other]

A Survey of Adversarial Defences and Robustness in NLP

Authors: Shreya Goyal, Sumanth Doddapaneni, Mitesh M. Khapra, Balaraman Ravindran

Abstract: In the past few years, it has become increasingly evident that deep neural networks are not resilient enough to withstand adversarial perturbations in input data, leaving them vulnerable to attack. Various authors have proposed strong adversarial attacks for computer vision and Natural Language Processing (NLP) tasks. As a response, many defense mechanisms have also been proposed to prevent these… ▽ More In the past few years, it has become increasingly evident that deep neural networks are not resilient enough to withstand adversarial perturbations in input data, leaving them vulnerable to attack. Various authors have proposed strong adversarial attacks for computer vision and Natural Language Processing (NLP) tasks. As a response, many defense mechanisms have also been proposed to prevent these networks from failing. The significance of defending neural networks against adversarial attacks lies in ensuring that the model's predictions remain unchanged even if the input data is perturbed. Several methods for adversarial defense in NLP have been proposed, catering to different NLP tasks such as text classification, named entity recognition, and natural language inference. Some of these methods not only defend neural networks against adversarial attacks but also act as a regularization mechanism during training, saving the model from overfitting. This survey aims to review the various methods proposed for adversarial defenses in NLP over the past few years by introducing a novel taxonomy. The survey also highlights the fragility of advanced deep neural networks in NLP and the challenges involved in defending them. △ Less

Submitted 18 April, 2023; v1 submitted 12 March, 2022; originally announced March 2022.

Comments: Accepted for publication at ACM Computing Surveys

arXiv:2109.14970 [pdf]

A Friend Recommendation System using Semantic Based KNN Algorithm

Authors: Srikantaiah K C, Salony Mewara, Sneha Goyal, Subhiksha S

Abstract: Social networking has become a major part of all our lives and we depend on it for day to day purposes. It is a medium that is used by people all around the world even in the smallest of towns. Its main purpose is to promote and aid communication between people. Social networks, such as Facebook, Twitter etc. were created for the sole purpose of helping individuals communicate about anything with… ▽ More Social networking has become a major part of all our lives and we depend on it for day to day purposes. It is a medium that is used by people all around the world even in the smallest of towns. Its main purpose is to promote and aid communication between people. Social networks, such as Facebook, Twitter etc. were created for the sole purpose of helping individuals communicate about anything with each other. These networks are becoming an important and also contemporary method to make friends from any part of this world. These new friends can communicate through any form of social media. Recommendation systems exist in all the social networks which aid users to find new friends and unite to more people and form associations and alliances with people. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Journal ref: Journal of Seybold Report, VOLUME 15 ISSUE 9 2020 , page 1201-1209

arXiv:2104.07378 [pdf, other]

Tracking entities in technical procedures -- a new dataset and baselines

Authors: Saransh Goyal, Pratyush Pandey, Garima Gaur, Subhalingam D, Srikanta Bedathur, Maya Ramanath

Abstract: We introduce TechTrack, a new dataset for tracking entities in technical procedures. The dataset, prepared by annotating open domain articles from WikiHow, consists of 1351 procedures, e.g., "How to connect a printer", identifies more than 1200 unique entities with an average of 4.7 entities per procedure. We evaluate the performance of state-of-the-art models on the entity-tracking task and find… ▽ More We introduce TechTrack, a new dataset for tracking entities in technical procedures. The dataset, prepared by annotating open domain articles from WikiHow, consists of 1351 procedures, e.g., "How to connect a printer", identifies more than 1200 unique entities with an average of 4.7 entities per procedure. We evaluate the performance of state-of-the-art models on the entity-tracking task and find that they are well below the human annotation performance. We describe how TechTrack can be used to take forward the research on understanding procedures from temporal texts. △ Less

Submitted 15 April, 2021; originally announced April 2021.

arXiv:2103.14250 [pdf, other]

doi 10.1109/ACCESS.2021.3085085

Evaluation of deep learning models for multi-step ahead time series prediction

Authors: Rohitash Chandra, Shaurya Goyal, Rishabh Gupta

Abstract: Time series prediction with neural networks has been the focus of much research in the past few decades. Given the recent deep learning revolution, there has been much attention in using deep learning models for time series prediction, and hence it is important to evaluate their strengths and weaknesses. In this paper, we present an evaluation study that compares the performance of deep learning m… ▽ More Time series prediction with neural networks has been the focus of much research in the past few decades. Given the recent deep learning revolution, there has been much attention in using deep learning models for time series prediction, and hence it is important to evaluate their strengths and weaknesses. In this paper, we present an evaluation study that compares the performance of deep learning models for multi-step ahead time series prediction. The deep learning methods comprise simple recurrent neural networks, long short-term memory (LSTM) networks, bidirectional LSTM networks, encoder-decoder LSTM networks, and convolutional neural networks. We provide a further comparison with simple neural networks that use stochastic gradient descent and adaptive moment estimation (Adam) for training. We focus on univariate time series for multi-step-ahead prediction from benchmark time-series datasets and provide a further comparison of the results with related methods from the literature. The results show that the bidirectional and encoder-decoder LSTM network provides the best performance in accuracy for the given time series problems. △ Less

Submitted 7 June, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Journal ref: IEEE Access, 2021

arXiv:2103.08298 [pdf, other]

Knowledge driven Description Synthesis for Floor Plan Interpretation

Authors: Shreya Goyal, Chiranjoy Chattopadhyay, Gaurav Bhatnagar

Abstract: Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, r… ▽ More Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, Description Synthesis from Image Cue (DSIC) and Transformer Based Description Generation (TBDG), for the floor plan image to text generation to fill the gaps in existing methods. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model's superiority. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: 19 pages, 18 Figure

arXiv:2103.08297 [pdf, other]

GRIHA: Synthesizing 2-Dimensional Building Layouts from Images Captured using a Smart Phone

Authors: Shreya Goyal, Naimul Khan, Chiranjoy Chattopadhyay, Gaurav Bhatnagar

Abstract: Reconstructing an indoor scene and generating a layout/floor plan in 3D or 2D is a widely known problem. Quite a few algorithms have been proposed in the literature recently. However, most existing methods either use RGB-D images, thus requiring a depth camera, or depending on panoramic photos, assuming that there is little to no occlusion in the rooms. In this work, we proposed GRIHA (Generating… ▽ More Reconstructing an indoor scene and generating a layout/floor plan in 3D or 2D is a widely known problem. Quite a few algorithms have been proposed in the literature recently. However, most existing methods either use RGB-D images, thus requiring a depth camera, or depending on panoramic photos, assuming that there is little to no occlusion in the rooms. In this work, we proposed GRIHA (Generating Room Interior of a House using ARCore), a framework for generating a layout using an RGB image captured using a simple mobile phone camera. We take advantage of Simultaneous Localization and Mapping (SLAM) to assess the 3D transformations required for layout generation. SLAM technology is built-in in recent mobile libraries such as ARCore by Google. Hence, the proposed method is fast and efficient. It gives the user freedom to generate layout by merely taking a few conventional photos, rather than relying on specialized depth hardware or occlusion-free panoramic images. We have compared GRIHA with other existing methods and obtained superior results. Also, the system is tested on multiple hardware platforms to test the dependency and efficiency. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: 19 pages, 22 Figures, 4 Tables

arXiv:2103.03426 [pdf, other]

Target Localization using Bistatic and Multistatic Radar with 5G NR Waveform

Authors: O. Kanhere, S. Goyal, M. Beluri, T. S. Rappaport

Abstract: Joint communication and sensing allows the utilization of common spectral resources for communication and localization, reducing the cost of deployment. By using fifth generation (5G) New Radio (NR) (i.e., the 3rd Generation Partnership Project Radio Access Network for 5G) reference signals, conventionally used for communication, this paper shows sub-meter precision localization is possible at mil… ▽ More Joint communication and sensing allows the utilization of common spectral resources for communication and localization, reducing the cost of deployment. By using fifth generation (5G) New Radio (NR) (i.e., the 3rd Generation Partnership Project Radio Access Network for 5G) reference signals, conventionally used for communication, this paper shows sub-meter precision localization is possible at millimeter wave frequencies. We derive the geometric dilution of precision of a bistatic radar configuration, a theoretical metric that characterizes how the target location estimation error varies as a function of the bistatic geometry and measurement errors. We develop a 5G NR compliant software test bench to characterize the measurement errors when estimating the time difference of arrival and angle of arrival with 5G NR waveforms. The test bench is further utilized to demonstrate the accuracy of target localization and velocity estimation in several indoor and outdoor bistatic and multistatic configurations and to show that on average, the bistatic configuration can achieve a location accuracy of 10.0 cm over a bistatic range of 25 m, which can be further improved by deploying a multistatic radar configuration. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: IEEE 93rd Vehicular Technology Conference (VTC-Spring)

arXiv:2103.01436 [pdf, other]

ForceNet: A Graph Neural Network for Large-Scale Quantum Calculations

Authors: Weihua Hu, Muhammed Shuaibi, Abhishek Das, Siddharth Goyal, Anuroop Sriram, Jure Leskovec, Devi Parikh, C. Lawrence Zitnick

Abstract: With massive amounts of atomic simulation data available, there is a huge opportunity to develop fast and accurate machine learning models to approximate expensive physics-based calculations. The key quantity to estimate is atomic forces, where the state-of-the-art Graph Neural Networks (GNNs) explicitly enforce basic physical constraints such as rotation-covariance. However, to strictly satisfy t… ▽ More With massive amounts of atomic simulation data available, there is a huge opportunity to develop fast and accurate machine learning models to approximate expensive physics-based calculations. The key quantity to estimate is atomic forces, where the state-of-the-art Graph Neural Networks (GNNs) explicitly enforce basic physical constraints such as rotation-covariance. However, to strictly satisfy the physical constraints, existing models have to make tradeoffs between computational efficiency and model expressiveness. Here we explore an alternative approach. By not imposing explicit physical constraints, we can flexibly design expressive models while maintaining their computational efficiency. Physical constraints are implicitly imposed by training the models using physics-based data augmentation. To evaluate the approach, we carefully design a scalable and expressive GNN model, ForceNet, and apply it to OC20 (Chanussot et al., 2020), an unprecedentedly-large dataset of quantum physics calculations. Our proposed ForceNet is able to predict atomic forces more accurately than state-of-the-art physics-based GNNs while being faster both in training and inference. Overall, our promising and counter-intuitive results open up an exciting avenue for future research. △ Less

Submitted 1 March, 2021; originally announced March 2021.

arXiv:2102.13309 [pdf, other]

Discord and Harmony in Networks

Authors: Andrea Galeotti, Benjamin Golub, Sanjeev Goyal, Rithvik Rao

Abstract: Consider a coordination game played on a network, where agents prefer taking actions closer to those of their neighbors and to their own ideal points in action space. We explore how the welfare outcomes of a coordination game depend on network structure and the distribution of ideal points throughout the network. To this end, we imagine a benevolent or adversarial planner who intervenes, at a cost… ▽ More Consider a coordination game played on a network, where agents prefer taking actions closer to those of their neighbors and to their own ideal points in action space. We explore how the welfare outcomes of a coordination game depend on network structure and the distribution of ideal points throughout the network. To this end, we imagine a benevolent or adversarial planner who intervenes, at a cost, to change ideal points in order to maximize or minimize utilitarian welfare subject to a constraint. A complete characterization of optimal interventions is obtained by decomposing interventions into principal components of the network's adjacency matrix. Welfare is most sensitive to interventions proportional to the last principal component, which focus on local disagreement. A welfare-maximizing planner optimally works to reduce local disagreement, bringing the ideal points of neighbors closer together, whereas a malevolent adversary optimally drives neighbors' ideal points apart to decrease welfare. Such welfare-maximizing/minimizing interventions are very different from ones that would be done to change some traditional measures of discord, such as the cross-sectional variation of equilibrium actions. In fact, an adversary sowing disagreement to maximize her impact on welfare will minimize her impact on global variation in equilibrium actions, underscoring a tension between improving welfare and increasing global cohesion of equilibrium behavior. △ Less

Submitted 26 February, 2021; originally announced February 2021.

arXiv:2102.11849 [pdf]

Usability and Security of Different Authentication Methods for an Electronic Health Records System

Authors: Saptarshi Purkayastha, Shreya Goyal, Bolu Oluwalade, Tyler Phillips, Huanmei Wu, Xukai Zou

Abstract: We conducted a survey of 67 graduate students enrolled in the Privacy and Security in Healthcare course at Indiana University Purdue University Indianapolis. This was done to measure user preference and their understanding of usability and security of three different Electronic Health Records authentication methods: single authentication method (username and password), Single sign-on with Central… ▽ More We conducted a survey of 67 graduate students enrolled in the Privacy and Security in Healthcare course at Indiana University Purdue University Indianapolis. This was done to measure user preference and their understanding of usability and security of three different Electronic Health Records authentication methods: single authentication method (username and password), Single sign-on with Central Authentication Service (CAS) authentication method, and a bio-capsule facial authentication method. This research aims to explore the relationship between security and usability, and measure the effect of perceived security on usability in these three aforementioned authentication methods. We developed a formative-formative Partial Least Square Structural Equation Modeling (PLS-SEM) model to measure the relationship between the latent variables of Usability, and Security. The measurement model was developed using five observed variables (measures). - Efficiency and Effectiveness, Satisfaction, Preference, Concerns, and Confidence. The results obtained highlight the importance and impact of these measures on the latent variables and the relationship among the latent variables. From the PLS-SEM analysis, it was found that security has a positive impact on usability for Single sign-on and bio-capsule facial authentication methods. We conclude that the facial authentication method was the most secure and usable among the three authentication methods. Further, descriptive analysis was done to draw out the interesting findings from the survey regarding the observed variables. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: HEALTHINF21 at the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021)

arXiv:2101.01048 [pdf, other]

Reducing the Paging Overhead in Highly Directional Systems

Authors: Sanjay Goyal, Hussain Elkotby, Ravikumar Pragada, Tanbir Haque

Abstract: New Radio (NR) supports operations at high-frequency bands (e.g., millimeter-wave frequencies) by using narrow beam based directional transmissions to compensate high propagation losses at such frequencies. Due to the limited spatial coverage with each beam, the broadcast transmission of paging in NR is performed using beam sweeping, which takes multiple time slots. Thus, the paging procedure used… ▽ More New Radio (NR) supports operations at high-frequency bands (e.g., millimeter-wave frequencies) by using narrow beam based directional transmissions to compensate high propagation losses at such frequencies. Due to the limited spatial coverage with each beam, the broadcast transmission of paging in NR is performed using beam sweeping, which takes multiple time slots. Thus, the paging procedure used in NR would substantially increase the downlink resource overhead of the network with directional transmissions. Such overhead would further increase as we move higher in the frequency bands, such as terahertz bands, which is being viewed as one of the potential candidates for future generation networks. Therefore, the NR based paging solution is infeasible for supporting highly directional systems. In this paper, we propose a novel minimal feedback enabled paging mechanism, which instead of using all the beams for paging transmissions, only activates sub-set of beams having one or more UEs under the coverage. UE presence indications are implemented to identify the correct set of beams to be activated. Our analytical analysis and simulations show that the proposed solution significantly reduces the downlink paging overhead compared to the NR based solution (e.g., more than 80% gain for a system supporting 64 number of beams at a UE density of 200 UEs per paging occasion) while incurring minimal energy cost at the UE side. △ Less

Submitted 5 January, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: 7 pages, 5 figures, accepted at IEEE VTC Spring 2021, April 2021, Helsinki, Finland

arXiv:2012.12835 [pdf, other]

doi 10.17605/OSF.IO/6Y8WG

Enabling Secure and Effective Biomedical Data Sharing through Cyberinfrastructure Gateways

Authors: Shreya Goyal, Saptarshi Purkayastha, Tyler Phillips, Rob Quick, Alexis Britt

Abstract: Dynaswap project reports on developing a coherently integrated and trustworthy holistic secure workflow protection architecture for cyberinfrastructures which can be used on virtual machines deployed through cyberinfrastructure (CI) services such as JetStream. This service creates a user-friendly cloud environment designed to give researchers access to interactive computing and data analysis resou… ▽ More Dynaswap project reports on developing a coherently integrated and trustworthy holistic secure workflow protection architecture for cyberinfrastructures which can be used on virtual machines deployed through cyberinfrastructure (CI) services such as JetStream. This service creates a user-friendly cloud environment designed to give researchers access to interactive computing and data analysis resources on demand. The Dynaswap cybersecurity architecture supports roles, role hierarchies, and data hierarchies, as well as dynamic changes of roles and hierarchical relations within the scientific infrastructure. Dynaswap combines existing cutting-edge security frameworks (including an Authentication Authorization-Accounting framework, Multi-Factor Authentication, Secure Digital Provenance, and Blockchain) with advanced security tools (e.g., Biometric-Capsule, Cryptography-based Hierarchical Access Control, and Dual-level Key Management). The CI is being validated in life-science research environments and in the education settings of Health Informatics. △ Less

Submitted 23 December, 2020; originally announced December 2020.

Comments: Presented at Gateways 2020, Online, USA, October 2020, see https://osf.io/meetings/gateways2020/

arXiv:2010.15947 [pdf, other]

PAL : Pretext-based Active Learning

Authors: Shubhang Bhatnagar, Sachin Goyal, Darshan Tank, Amit Sethi

Abstract: The goal of pool-based active learning is to judiciously select a fixed-sized subset of unlabeled samples from a pool to query an oracle for their labels, in order to maximize the accuracy of a supervised learner. However, the unsaid requirement that the oracle should always assign correct labels is unreasonable for most situations. We propose an active learning technique for deep neural networks… ▽ More The goal of pool-based active learning is to judiciously select a fixed-sized subset of unlabeled samples from a pool to query an oracle for their labels, in order to maximize the accuracy of a supervised learner. However, the unsaid requirement that the oracle should always assign correct labels is unreasonable for most situations. We propose an active learning technique for deep neural networks that is more robust to mislabeling than the previously proposed techniques. Previous techniques rely on the task network itself to estimate the novelty of the unlabeled samples, but learning the task (generalization) and selecting samples (out-of-distribution detection) can be conflicting goals. We use a separate network to score the unlabeled samples for selection. The scoring network relies on self-supervision for modeling the distribution of the labeled samples to reduce the dependency on potentially noisy labels. To counter the paucity of data, we also deploy another head on the scoring network for regularization via multi-task learning and use an unusual self-balancing hybrid scoring function. Furthermore, we divide each query into sub-queries before labeling to ensure that the query has diverse samples. In addition to having a higher tolerance to mislabeling of samples by the oracle, the resultant technique also produces competitive accuracy in the absence of label noise. The technique also handles the introduction of new classes on-the-fly well by temporarily increasing the sampling rate of these classes. △ Less

Submitted 28 March, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

arXiv:2010.11125 [pdf, other]

Beyond English-Centric Multilingual Machine Translation

Authors: Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin

Abstract: Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In… ▽ More Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model. △ Less

Submitted 21 October, 2020; originally announced October 2020.

arXiv:2010.09990 [pdf, other]

doi 10.1021/acscatal.0c04525

The Open Catalyst 2020 (OC20) Dataset and Community Challenges

Authors: Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi

Abstract: Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both… ▽ More Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbate identity/configurations, perhaps because datasets have been smaller in catalysis than related fields. To address this we developed the OC20 dataset, consisting of 1,281,040 Density Functional Theory (DFT) relaxations (~264,890,000 single point evaluations) across a wide swath of materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We supplemented this dataset with randomly perturbed structures, short timescale molecular dynamics, and electronic structure analyses. The dataset comprises three central tasks indicative of day-to-day catalyst modeling and comes with pre-defined train/validation/test splits to facilitate direct comparisons with future model development efforts. We applied three state-of-the-art graph neural network models (CGCNN, SchNet, Dimenet++) to each of these tasks as baseline demonstrations for the community to build on. In almost every task, no upper limit on model size was identified, suggesting that even larger models are likely to improve on initial results. The dataset and baseline models are both provided as open resources, as well as a public leader board to encourage community contributions to solve these important tasks. △ Less

Submitted 24 September, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 37 pages, 11 figures, submitted to ACS Catalysis

arXiv:2010.09435 [pdf, other]

An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage

Authors: C. Lawrence Zitnick, Lowik Chanussot, Abhishek Das, Siddharth Goyal, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Thibaut Lavril, Aini Palizhati, Morgane Riviere, Muhammed Shuaibi, Anuroop Sriram, Kevin Tran, Brandon Wood, Junwoong Yoon, Devi Parikh, Zachary Ulissi

Abstract: Scalable and cost-effective solutions to renewable energy storage are essential to addressing the world's rising energy needs while reducing climate change. As we increase our reliance on renewable energy sources such as wind and solar, which produce intermittent power, storage is needed to transfer power from times of peak generation to peak demand. This may require the storage of power for hours… ▽ More Scalable and cost-effective solutions to renewable energy storage are essential to addressing the world's rising energy needs while reducing climate change. As we increase our reliance on renewable energy sources such as wind and solar, which produce intermittent power, storage is needed to transfer power from times of peak generation to peak demand. This may require the storage of power for hours, days, or months. One solution that offers the potential of scaling to nation-sized grids is the conversion of renewable energy to other fuels, such as hydrogen or methane. To be widely adopted, this process requires cost-effective solutions to running electrochemical reactions. An open challenge is finding low-cost electrocatalysts to drive these reactions at high rates. Through the use of quantum mechanical simulations (density functional theory), new catalyst structures can be tested and evaluated. Unfortunately, the high computational cost of these simulations limits the number of structures that may be tested. The use of machine learning may provide a method to efficiently approximate these calculations, leading to new approaches in finding effective electrocatalysts. In this paper, we provide an introduction to the challenges in finding suitable electrocatalysts, how machine learning may be applied to the problem, and the use of the Open Catalyst Project OC20 dataset for model training. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: 27 pages

ACM Class: I.2.6; J.2

arXiv:2003.13991 [pdf, other]

doi 10.1109/WPNC47567.2019.8970257

Indoor Distance Estimation using LSTMs over WLAN Network

Authors: Pranav Sankhe, Saqib Azim, Sachin Goyal, Tanya Choudhary, Kumar Appaiah, Sukumar Srikant

Abstract: The Global Navigation Satellite Systems (GNSS) like GPS suffer from accuracy degradation and are almost unavailable in indoor environments. Indoor positioning systems (IPS) based on WiFi signals have been gaining popularity. However, owing to the strong spatial and temporal variations of wireless communication channels in the indoor environment, the achieved accuracy of existing IPS is around seve… ▽ More The Global Navigation Satellite Systems (GNSS) like GPS suffer from accuracy degradation and are almost unavailable in indoor environments. Indoor positioning systems (IPS) based on WiFi signals have been gaining popularity. However, owing to the strong spatial and temporal variations of wireless communication channels in the indoor environment, the achieved accuracy of existing IPS is around several tens of centimeters. We present the detailed design and implementation of a self-adaptive WiFi-based indoor distance estimation system using LSTMs. The system is novel in its method of estimating with high accuracy the distance of an object by overcoming possible causes of channel variations and is self-adaptive to the changing environmental and surrounding conditions. The proposed design has been developed and physically realized over a WiFi network consisting of ESP8266 (NodeMCU) devices. The experiment were conducted in a real indoor environment while changing the surroundings in order to establish the adaptability of the system. We introduce and compare different architectures for this task based on LSTMs, CNNs, and fully connected networks (FCNs). We show that the LSTM based model performs better among all the above-mentioned architectures by achieving an accuracy of 5.85 cm with a confidence interval of 93% on the scale of (4.14 m * 2.86 m). To the best of our knowledge, the proposed method outperforms other methods reported in the literature by a significant margin. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: Published in IEEE 16th Workshop on Positioning, Navigation and Communications (WPNC 2019, Germany)

Journal ref: 2019 16th IEEE Workshop on Positioning, Navigation and Communications

arXiv:2003.11170 [pdf, other]

Norms and Sanctions as a Basis for Promoting Cybersecurity Practices

Authors: Nirav Ajmeri, Shubham Goyal, Munindar P. Singh

Abstract: Many cybersecurity breaches occur due to users not following good cybersecurity practices, chief among them being regulations for applying software patches to operating systems, updating applications, and maintaining strong passwords. We capture cybersecurity expectations on users as norms. We empirically investigate sanctioning mechanisms in promoting compliance with those norms as well as the… ▽ More Many cybersecurity breaches occur due to users not following good cybersecurity practices, chief among them being regulations for applying software patches to operating systems, updating applications, and maintaining strong passwords. We capture cybersecurity expectations on users as norms. We empirically investigate sanctioning mechanisms in promoting compliance with those norms as well as the detrimental effect of sanctions on the ability of users to complete their work. We realize these ideas in a game that emulates the decision making of workers in a research lab. Through a human-subject study, we find that whereas individual sanctions are more effective than group sanctions in achieving compliance and less detrimental on the ability of users to complete their work, individual sanctions offer significantly lower resilience especially for organizations comprising risk seekers. Our findings have implications for workforce training in cybersecurity. △ Less

Submitted 24 March, 2020; originally announced March 2020.

Comments: 10 pages, 4 figures

arXiv:2002.12718 [pdf, other]

DROCC: Deep Robust One-Class Classification

Authors: Sachin Goyal, Aditi Raghunathan, Moksh Jain, Harsha Vardhan Simhadri, Prateek Jain

Abstract: Classical approaches for one-class problems such as one-class SVM and isolation forest require careful feature engineering when applied to structured domains like images. State-of-the-art methods aim to leverage deep learning to learn appropriate features via two main approaches. The first approach based on predicting transformations (Golan & El-Yaniv, 2018; Hendrycks et al., 2019a) while successf… ▽ More Classical approaches for one-class problems such as one-class SVM and isolation forest require careful feature engineering when applied to structured domains like images. State-of-the-art methods aim to leverage deep learning to learn appropriate features via two main approaches. The first approach based on predicting transformations (Golan & El-Yaniv, 2018; Hendrycks et al., 2019a) while successful in some domains, crucially depends on an appropriate domain-specific set of transformations that are hard to obtain in general. The second approach of minimizing a classical one-class loss on the learned final layer representations, e.g., DeepSVDD (Ruff et al., 2018) suffers from the fundamental drawback of representation collapse. In this work, we propose Deep Robust One-Class Classification (DROCC) that is both applicable to most standard domains without requiring any side-information and robust to representation collapse. DROCC is based on the assumption that the points from the class of interest lie on a well-sampled, locally linear low dimensional manifold. Empirical evaluation demonstrates that DROCC is highly effective in two different one-class problem settings and on a range of real-world datasets across different domains: tabular data, images (CIFAR and ImageNet), audio, and time-series, offering up to 20% increase in accuracy over the state-of-the-art in anomaly detection. Code is available at https://github.com/microsoft/EdgeML. △ Less

Submitted 15 August, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

Comments: 16 pages, 9 figures, Published at International Conference on Machine Learning (ICML) 2020

arXiv:2001.10309 [pdf, other]

New Radio Physical Layer Abstraction for System-Level Simulations of 5G Networks

Authors: Sandra Lagen, Kevin Wanuga, Hussain Elkotby, Sanjay Goyal, Natale Patriciello, Lorenza Giupponi

Abstract: A physical layer (PHY) abstraction model estimates the PHY performance in system-level simulators to speed up the simulations. This paper presents a PHY abstraction model for 5G New Radio (NR) and its integration into an open-source ns-3 based NR system-level simulator. The model capitalizes on the exponential effective signal-to-interference-plus-noise ratio (SINR) mapping (EESM) and considers th… ▽ More A physical layer (PHY) abstraction model estimates the PHY performance in system-level simulators to speed up the simulations. This paper presents a PHY abstraction model for 5G New Radio (NR) and its integration into an open-source ns-3 based NR system-level simulator. The model capitalizes on the exponential effective signal-to-interference-plus-noise ratio (SINR) mapping (EESM) and considers the latest NR specification. To generate it, we used an NR-compliant link-level simulator to calibrate the EESM method as well as to obtain SINR-block error rate (BLER) lookup tables for various NR configurations. We also illustrate the usability of the developed model through end-to-end simulations in ns-3, under different NR settings of modulation and coding schemes, hybrid automatic repeat request combining methods, and link adaptation approaches. △ Less

Submitted 19 April, 2021; v1 submitted 28 January, 2020; originally announced January 2020.

Comments: published in IEEE ICC

arXiv:2001.08950 [pdf, other]

PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

Authors: Saurabh Goyal, Anamitra R. Choudhury, Saurabh M. Raje, Venkatesan T. Chakaravarthy, Yogish Sabharwal, Ashish Verma

Abstract: We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by developing a strategy for measuring their significance, based on the self-att… ▽ More We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by developing a strategy for measuring their significance, based on the self-attention mechanism. c) learning how many word-vectors to eliminate by augmenting the BERT model and the loss function. Experiments on the standard GLUE benchmark shows that PoWER-BERT achieves up to 4.5x reduction in inference time over BERT with <1% loss in accuracy. We show that PoWER-BERT offers significantly better trade-off between accuracy and inference time compared to prior methods. We demonstrate that our method attains up to 6.8x reduction in inference time with <1% loss in accuracy when applied over ALBERT, a highly compressed version of BERT. The code for PoWER-BERT is publicly available at https://github.com/IBM/PoWER-BERT. △ Less

Submitted 8 September, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: Accepted at ICML 2020

arXiv:2001.04779 [pdf, other]

NR-U and WiGig Coexistence in 60 GHz Bands

Authors: Natale Patriciello, Sanjay Goyal, Sandra Lagen, Lorenza Giupponi, Biljana Bojovic, Alpaslan Demir, Mihaela Beluri

Abstract: In December 2019, the 3GPP defined the road-map for Release-17, which includes new features on the operation of New Radio (NR) in millimeter-wave bands with highly directional communications systems, i.e., up to 52.6 GHz. In this paper, a system-level simulation based study on the coexistence of NR-based access to unlicensed spectrum (NR-U) and an IEEE technology, i.e., 802.11ad Wireless Gigabit (… ▽ More In December 2019, the 3GPP defined the road-map for Release-17, which includes new features on the operation of New Radio (NR) in millimeter-wave bands with highly directional communications systems, i.e., up to 52.6 GHz. In this paper, a system-level simulation based study on the coexistence of NR-based access to unlicensed spectrum (NR-U) and an IEEE technology, i.e., 802.11ad Wireless Gigabit (WiGig), at 60 GHz bands is conducted. For NR-U, an extension of NR Release-15 based model is used such that the 60 GHz regulatory requirements are satisfied. First, the design and capabilities of the developed open source ns-3 based simulator are presented and then end-to-end performance results of coexistence with different channel access mechanisms for NR-U in a 3GPP indoor scenario are discussed. It is shown that NR-U with Listen-Before-Talk channel access mechanism does not have any adverse impact on WiGig performance in terms of throughput and latency, which demonstrates that NR-U design fulfills the fairness coexistence objective, i.e., NR-U and WiGig coexistence is proven to be feasible. △ Less

Submitted 21 January, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

arXiv:1907.09273 [pdf, other]

Why Build an Assistant in Minecraft?

Authors: Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston

Abstract: In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue. In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue. △ Less

Submitted 25 July, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

Showing 1–50 of 76 results for author: Goyal, S