Zum Hauptinhalt springen

Showing 1–50 of 72 results for author: Mahajan, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  2. arXiv:2407.13853  [pdf, other

    cs.LG cs.PF

    Data-driven Forecasting of Deep Learning Performance on GPUs

    Authors: Seonho Lee, Amar Phanishayee, Divya Mahajan

    Abstract: Deep learning kernels exhibit predictable memory accesses and compute patterns, making GPUs' parallel architecture well-suited for their execution. Software and runtime systems for GPUs are optimized to better utilize the stream multiprocessors, on-chip cache, and off-chip high-bandwidth memory. As deep learning models and GPUs evolve, access to newer GPUs is often limited, raising questions about… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.13143  [pdf, other

    cs.LG cs.AR cs.DC

    Integrated Hardware Architecture and Device Placement Search

    Authors: Irene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan

    Abstract: Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. This is the first work to explore the co-optimization of determining the optimal architecture and device placement strategy through novel algorithms, improving the balance of computational resources, memory usage, and data distribution. Our architect… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted at the 41st International Conference on Machine Learning (ICML), 2024

  4. arXiv:2405.05734  [pdf, other

    cs.IT q-bio.GN

    On the Coverage Required for Diploid Genome Assembly

    Authors: Daanish Mahajan, Chirag Jain, Navin Kashyap

    Abstract: We investigate the information-theoretic conditions to achieve the complete reconstruction of a diploid genome. We also analyze the standard greedy and de-Bruijn graph-based algorithms and compare the coverage depth and read length requirements with the information-theoretic lower bound. Our results show that the gap between the two is considerable because both algorithms require the double repeat… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted at ISIT'24

  5. arXiv:2404.14632  [pdf, other

    cs.AR cs.DC

    Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training

    Authors: Muhammad Adnan, Amar Phanishayee, Janardhan Kulkarni, Prashant J. Nair, Divya Mahajan

    Abstract: In this paper, we present a novel technique to search for hardware architectures of accelerators optimized for end-to-end training of deep neural networks (DNNs). Our approach addresses both single-device and distributed pipeline and tensor model parallel scenarios, latter being addressed for the first time. The search optimized accelerators for training relevant metrics such as throughput/TDP und… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  6. arXiv:2404.05545  [pdf, other

    cs.LG cs.AI cs.CL stat.ME

    Evaluating Interventional Reasoning Capabilities of Large Language Models

    Authors: Tejas Kasetty, Divyat Mahajan, Gintare Karolina Dziugaite, Alexandre Drouin, Dhanya Sridhar

    Abstract: Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consider using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how L… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 17 pages

  7. arXiv:2404.04270  [pdf, other

    cs.IR cs.LG

    Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings

    Authors: Yassaman Ebrahimzadeh Maboud, Muhammad Adnan, Divya Mahajan, Prashant J. Nair

    Abstract: Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variatio… ▽ More

    Submitted 21 March, 2024; originally announced April 2024.

  8. arXiv:2403.11472  [pdf, other

    cs.LG cs.AR cs.DB

    Accelerating String-Key Learned Index Structures via Memoization-based Incremental Training

    Authors: Minsu Kim, Jinwoo Hwang, Guseul Heo, Seiyeon Cho, Divya Mahajan, Jongse Park

    Abstract: Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes. These indexes use the mapping information as training data. Learned indexes require frequent retrainings of their models to incorporate the changes introduced by update queries. To efficiently retrain the models, existing learned index systems often harness a linea… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at VLDB '24; 12 pages + 2 pages (ref), 18 figures, 2 tables

  9. NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing

    Authors: Guseul Heo, Sangyeop Lee, Jaehong Cho, Hyunmin Choi, Sanghyeon Lee, Hyungkyu Ham, Gwangsun Kim, Divya Mahajan, Jongse Park

    Abstract: Modern transformer-based Large Language Models (LLMs) are constructed with a series of decoder blocks. Each block comprises three key components: (1) QKV generation, (2) multi-head attention, and (3) feed-forward networks. In batched processing, QKV generation and feed-forward networks involve compute-intensive matrix-matrix multiplications (GEMM), while multi-head attention requires bandwidth-hea… ▽ More

    Submitted 29 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 16 pages, 15 figures

    Journal ref: ASPLOS 2024

  10. arXiv:2312.03584  [pdf, other

    cs.CV

    Context Diffusion: In-Context Aware Image Generation

    Authors: Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic

    Abstract: We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and fidelity of the generated images deteriorate when the prompt is not present, demonst… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  11. arXiv:2311.16958  [pdf

    cs.NE

    From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban Search and Rescue

    Authors: Gautam Siddharth Kashyap, Deepkashi Mahajan, Orchid Chetia Phukan, Ankit Kumar, Alexander E. I. Brownlee, Jiechao Gao

    Abstract: In this study, we present a novel hybrid algorithm, combining Levy Flight (LF) and Particle Swarm Optimization (PSO) (LF-PSO), tailored for efficient multi-robot exploration in unknown environments with limited communication and no global positioning information. The research addresses the growing interest in employing multiple autonomous robots for exploration tasks, particularly in scenarios suc… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  12. arXiv:2311.10794  [pdf, other

    cs.CV

    Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

    Authors: Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan

    Abstract: We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that r… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  13. arXiv:2309.15807  [pdf, other

    cs.CV

    Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

    Authors: Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda , et al. (1 additional authors not shown)

    Abstract: Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusivel… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  14. arXiv:2308.14902  [pdf, other

    cs.IR cs.LG

    Ad-Rec: Advanced Feature Interactions to Address Covariate-Shifts in Recommendation Networks

    Authors: Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

    Abstract: Recommendation models are vital in delivering personalized user experiences by leveraging the correlation between multiple input features. However, deep learning-based recommendation models often face challenges due to evolving user behaviour and item features, leading to covariate shifts. Effective cross-feature learning is crucial to handle data distribution drift and adapting to changing user b… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  15. arXiv:2307.02623  [pdf, other

    cs.LG cs.DC

    FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout

    Authors: Irene Wang, Prashant J. Nair, Divya Mahajan

    Abstract: Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall traini… ▽ More

    Submitted 26 September, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

  16. arXiv:2307.02598  [pdf, other

    cs.LG stat.ML

    Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

    Authors: Sébastien Lachapelle, Divyat Mahajan, Ioannis Mitliagkas, Simon Lacoste-Julien

    Abstract: We tackle the problems of latent variables identification and ``out-of-support'' image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions… ▽ More

    Submitted 2 November, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 37 (NeurIPS 2023). 39 pages

    ACM Class: I.2.6; I.5.1

  17. arXiv:2306.10452  [pdf, other

    cs.CL

    MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

    Authors: Keerthiram Murugesan, Sarathkrishna Swaminathan, Soham Dan, Subhajit Chaudhury, Chulaka Gunasekara, Maxwell Crouse, Diwakar Mahajan, Ibrahim Abdelaziz, Achille Fokoue, Pavan Kapanipathi, Salim Roukos, Alexander Gray

    Abstract: With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model huma… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

    Comments: Accepted at ACL 2023 (ACL Findings Long)

  18. arXiv:2305.06025  [pdf

    eess.IV cs.CV

    Brain Tumor Detection using Swin Transformers

    Authors: Prateek A. Meshram, Suraj Joshi, Devarshi Mahajan

    Abstract: The first MRI scan was done in the year 1978 by researchers at EML Laboratories. As per an estimate, approximately 251,329 people died due to primary cancerous brain and CNS (Central Nervous System) Tumors in the year 2020. It has been recommended by various medical professionals that brain tumor detection at an early stage would help in saving many lives. Whenever radiologists deal with a brain M… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  19. arXiv:2302.08091  [pdf, other

    cs.CL

    Do We Still Need Clinical Language Models?

    Authors: Eric Lehman, Evan Hernandez, Diwakar Mahajan, Jonas Wulff, Micah J. Smith, Zachary Ziegler, Daniel Nadler, Peter Szolovits, Alistair Johnson, Emily Alsentzer

    Abstract: Although recent advances in scaling large language models (LLMs) have resulted in improvements on many NLP tasks, it remains unclear whether these models trained primarily with general web text are the right tool in highly specialized, safety critical domains such as clinical text. Recent results have suggested that LLMs encode a surprising amount of medical knowledge. This raises an important que… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  20. arXiv:2301.02280  [pdf, other

    cs.CV

    Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training

    Authors: Filip Radenovic, Abhimanyu Dubey, Abhishek Kadian, Todor Mihaylov, Simon Vandenhende, Yash Patel, Yi Wen, Vignesh Ramanathan, Dhruv Mahajan

    Abstract: Vision-language models trained with contrastive learning on large-scale noisy data are becoming increasingly popular for zero-shot recognition problems. In this paper we improve the following three aspects of the contrastive pre-training pipeline: dataset noise, model initialization and the training objective. First, we propose a straightforward filtering strategy titled Complexity, Action, and Te… ▽ More

    Submitted 29 March, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: CVPR 2023

  21. arXiv:2301.01795  [pdf, other

    cs.CV

    PACO: Parts and Attributes of Common Objects

    Authors: Vignesh Ramanathan, Anmol Kalia, Vladan Petrovic, Yi Wen, Baixue Zheng, Baishan Guo, Rui Wang, Aaron Marquez, Rama Kovvuri, Abhishek Kadian, Amir Mousavi, Yiwen Song, Abhimanyu Dubey, Dhruv Mahajan

    Abstract: Object models are gradually progressing from predicting just category labels to providing detailed descriptions of object instances. This motivates the need for large datasets which go beyond traditional object masks and provide richer annotations such as part masks and attributes. Hence, we introduce PACO: Parts and Attributes of Common Objects. It spans 75 object categories, 456 object-part cate… ▽ More

    Submitted 4 January, 2023; originally announced January 2023.

  22. arXiv:2211.14666  [pdf, other

    cs.LG stat.ML

    Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning

    Authors: Sébastien Lachapelle, Tristan Deleu, Divyat Mahajan, Ioannis Mitliagkas, Yoshua Bengio, Simon Lacoste-Julien, Quentin Bertrand

    Abstract: Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maxima… ▽ More

    Submitted 6 June, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: Appears in: Fortieth International Conference on Machine Learning (ICML 2023). 36 pages

    ACM Class: I.2.6; I.5.1

  23. arXiv:2211.01939  [pdf, other

    cs.LG cs.AI stat.ME

    Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation

    Authors: Divyat Mahajan, Ioannis Mitliagkas, Brady Neal, Vasilis Syrgkanis

    Abstract: We study the problem of model selection in causal inference, specifically for conditional average treatment effect (CATE) estimation. Unlike machine learning, there is no perfect analogue of cross-validation for model selection as we do not observe the counterfactual potential outcomes. Towards this, a variety of surrogate metrics have been proposed for CATE model selection that use only observed… ▽ More

    Submitted 29 April, 2024; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: Proceedings of the 12th International Conference on Learning Representations (ICLR), 2024. (Spotlight)

  24. arXiv:2209.11924  [pdf, other

    stat.ML cs.LG

    Interventional Causal Representation Learning

    Authors: Kartik Ahuja, Divyat Mahajan, Yixin Wang, Yoshua Bengio

    Abstract: Causal representation learning seeks to extract high-level latent factors from low-level sensory data. Most existing methods rely on observational data and structural assumptions (e.g., conditional independence) to identify the latent factors. However, interventional data is prevalent across applications. Can interventional data facilitate causal representation learning? We explore this question i… ▽ More

    Submitted 22 February, 2024; v1 submitted 24 September, 2022; originally announced September 2022.

  25. Extracting Medication Changes in Clinical Narratives using Pre-trained Language Models

    Authors: Giridhar Kaushik Ramachandran, Kevin Lybarger, Yaya Liu, Diwakar Mahajan, Jennifer J. Liang, Ching-Huei Tsou, Meliha Yetisgen, Özlem Uzuner

    Abstract: An accurate and detailed account of patient medications, including medication changes within the patient timeline, is essential for healthcare providers to provide appropriate patient care. Healthcare providers or the patients themselves may initiate changes to patient medication. Medication changes take many forms, including prescribed medication and associated dosage modification. These changes… ▽ More

    Submitted 12 January, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Journal ref: Journal of Biomedical Informatics.139.2023.104302.1532-0464

  26. arXiv:2207.06820  [pdf, other

    cs.DB

    Using Fuzzy Matching of Queries to optimize Database workloads

    Authors: Sweta Singh, Vaibhav Kulkarni, Mario Briggs, Deepak Mahajan, Eitan Farchi

    Abstract: Directed Acyclic Graphs (DAGs) are commonly used in Databases and Big Data computational engines like Apache Spark for representing the execution plan of queries. We refer to such graphs as Query Directed Acyclic Graphs (QDAGs). This paper uses similarity hashing to arrive at a fingerprint such that the fingerprint embodies the compute requirements of the query for QDAGs. The fingerprint, thus obt… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: 9 pages, 5 figures

  27. arXiv:2205.14120  [pdf, other

    cs.LG cs.CV

    Neural Basis Models for Interpretability

    Authors: Filip Radenovic, Abhimanyu Dubey, Dhruv Mahajan

    Abstract: Due to the widespread use of complex machine learning models in real-world applications, it is becoming critical to explain model predictions. However, these models are typically black-box deep neural networks, explained post-hoc via methods with known faithfulness limitations. Generalized Additive Models (GAMs) are an inherently interpretable class of models that address this limitation by learni… ▽ More

    Submitted 18 October, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: 17 pages including appendix. v2 includes link to source code available at https://github.com/facebookresearch/nbm-spam. v3 includes updates to baseline results, v4 updated for NeurIPS camera ready

  28. arXiv:2205.14108  [pdf, other

    cs.LG cs.CV

    Scalable Interpretability via Polynomials

    Authors: Abhimanyu Dubey, Filip Radenovic, Dhruv Mahajan

    Abstract: Generalized Additive Models (GAMs) have quickly become the leading choice for inherently-interpretable machine learning. However, unlike uninterpretable methods such as DNNs, they lack expressive power and easy scalability, and are hence not a feasible alternative for real-world tasks. We present a new class of GAMs that use tensor rank decompositions of polynomials to learn powerful, {\em inheren… ▽ More

    Submitted 18 October, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: 26 pages including appendix. v2 includes source code link at https://github.com/facebookresearch/nbm-spam, v3 fixes to baseline results in Table 1, v4 update for NeurIPS camera ready

  29. arXiv:2204.05436  [pdf, other

    cs.AR cs.AI cs.LG

    Heterogeneous Acceleration Pipeline for Recommendation System Training

    Authors: Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

    Abstract: Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer tim… ▽ More

    Submitted 28 April, 2024; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted at The International Symposium on Computer Architecture (ISCA), 2024

  30. arXiv:2204.04606  [pdf, other

    cs.LG cs.AI stat.ML

    Towards efficient representation identification in supervised learning

    Authors: Kartik Ahuja, Divyat Mahajan, Vasilis Syrgkanis, Ioannis Mitliagkas

    Abstract: Humans have a remarkable ability to disentangle complex sensory inputs (e.g., image, text) into simple factors of variation (e.g., shape, color) without much supervision. This ability has inspired many works that attempt to solve the following question: how do we invert the data generation process to extract those factors with minimal or no supervision? Several works in the literature on non-linea… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: Proceedings of the First Conference on Causal Learning and Reasoning

  31. arXiv:2203.12892  [pdf, other

    cs.CV

    Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals

    Authors: Simon Vandenhende, Dhruv Mahajan, Filip Radenovic, Deepti Ghadiyaram

    Abstract: A visual counterfactual explanation replaces image regions in a query image with regions from a distractor image such that the system's decision on the transformed image changes to the distractor class. In this work, we present a novel framework for computing visual counterfactual explanations based on two key ideas. First, we enforce that the replaced and replacer regions contain the same semanti… ▽ More

    Submitted 16 July, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Camera-ready version ECCV 2022

  32. arXiv:2201.08371  [pdf, other

    cs.CV

    Revisiting Weakly Supervised Pre-Training of Visual Perception Models

    Authors: Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr Dollár, Laurens van der Maaten

    Abstract: Model pre-training is a cornerstone of modern visual recognition systems. Although fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent studies suggest that large-scale weakly supervised pre-training can outperform fully supervised approaches. This paper revisits weakly-supervised pre-training of models using hashtag supervision with modern versions of res… ▽ More

    Submitted 2 April, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: CVPR 2022

  33. arXiv:2112.04766  [pdf, other

    cs.LG cs.CV

    Adaptive Methods for Aggregated Domain Generalization

    Authors: Xavier Thomas, Dhruv Mahajan, Alex Pentland, Abhimanyu Dubey

    Abstract: Domain generalization involves learning a classifier from a heterogeneous collection of training sources such that it generalizes to data drawn from similar unknown target domains, with applications in large-scale learning and personalized inference. In many settings, privacy concerns prohibit obtaining domain labels for the training data samples, and instead only have an aggregated collection of… ▽ More

    Submitted 23 December, 2021; v1 submitted 9 December, 2021; originally announced December 2021.

  34. arXiv:2110.03369  [pdf, other

    cs.LG cs.AI cs.CR

    The Connection between Out-of-Distribution Generalization and Privacy of ML Models

    Authors: Divyat Mahajan, Shruti Tople, Amit Sharma

    Abstract: With the goal of generalizing to out-of-distribution (OOD) data, recent domain generalization methods aim to learn "stable" feature representations whose effect on the output remains invariant across domains. Given the theoretical connection between generalization and privacy, we ask whether better OOD generalization leads to better privacy for machine learning models, where privacy is measured th… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Prior version accepted at Workshop on Privacy Preserving Machine Learning, NeurIPS 2020. Code: https://github.com/microsoft/robustdg

  35. arXiv:2105.13995  [pdf, other

    cs.CL

    SemEval-2021 Task 9: Fact Verification and Evidence Finding for Tabular Data in Scientific Documents (SEM-TAB-FACTS)

    Authors: Nancy X. R. Wang, Diwakar Mahajan, Marina Danilevsky, Sara Rosenthal

    Abstract: Understanding tables is an important and relevant task that involves understanding table structure as well as being able to compare and contrast information within cells. In this paper, we address this challenge by presenting a new dataset and tasks that addresses this goal in a shared task in SemEval 2020 Task 9: Fact Verification and Evidence Finding for Tabular Data in Scientific Documents (SEM… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: To Appear in SemEval 2021

  36. arXiv:2105.11373  [pdf, other

    cs.CV

    Large-Scale Attribute-Object Compositions

    Authors: Filip Radenovic, Animesh Sinha, Albert Gordo, Tamara Berg, Dhruv Mahajan

    Abstract: We study the problem of learning how to predict attribute-object compositions from images, and its generalization to unseen compositions missing from the training data. To the best of our knowledge, this is a first large-scale study of this problem, involving hundreds of thousands of compositions. We train our framework with images from Instagram using hashtags as noisy weak supervision. We make c… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

  37. arXiv:2104.01567  [pdf, other

    cs.CL

    MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation using Augmented Data, Signals, and Transformers

    Authors: Rohan Gupta, Jay Mundra, Deepak Mahajan, Ashutosh Modi

    Abstract: In this work, we present our approach for solving the SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC). The task is a sentence pair classification problem where the goal is to detect whether a given word common to both the sentences evokes the same meaning. We submit systems for both the settings - Multilingual (the pair's sentences belong to the same la… ▽ More

    Submitted 4 April, 2021; originally announced April 2021.

    Comments: Accepted at SemEval 2021 Task 2, 10 Pages (8 Pages main content+ 2 pages for references)

  38. arXiv:2103.15796  [pdf, other

    cs.CV cs.LG

    Adaptive Methods for Real-World Domain Generalization

    Authors: Abhimanyu Dubey, Vignesh Ramanathan, Alex Pentland, Dhruv Mahajan

    Abstract: Invariant approaches have been remarkably successful in tackling the problem of domain generalization, where the objective is to perform inference on data distributions different from those used in training. In our work, we investigate whether it is possible to leverage domain information from the unseen test samples themselves. We propose a domain-adaptive approach consisting of two steps: a) we… ▽ More

    Submitted 29 March, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: To appear as an oral presentation in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. v2 corrects double printing of appendix

  39. arXiv:2103.12886  [pdf, other

    cs.CV

    Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

    Authors: Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan Yuille, Zhenheng Yang

    Abstract: Weakly supervised instance segmentation reduces the cost of annotations required to train models. However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and (b) missing object predictions. We show that these issues can be better addressed by training with weakly labeled videos instead of images. In videos… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: 14 pages, 8 figures, accepted by CVPR 2021

  40. arXiv:2103.00686  [pdf, other

    cs.IR cs.AI cs.AR cs.LG

    Accelerating Recommendation System Training by Leveraging Popular Choices

    Authors: Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

    Abstract: Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation model… ▽ More

    Submitted 28 September, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    ACM Class: I.2.6; C.5.0

    Journal ref: Proceedings of the VLDB Endowment, 2022

  41. arXiv:2102.05843  [pdf, other

    cs.CV cs.LG

    Driving Style Representation in Convolutional Recurrent Neural Network Model of Driver Identification

    Authors: Sobhan Moosavi, Pravar D. Mahajan, Srinivasan Parthasarathy, Colleen Saunders-Chukwu, Rajiv Ramnath

    Abstract: Identifying driving styles is the task of analyzing the behavior of drivers in order to capture variations that will serve to discriminate different drivers from each other. This task has become a prerequisite for a variety of applications, including usage-based insurance, driver coaching, driver action prediction, and even in designing autonomous vehicles; because driving style encodes essential… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: 12 pages, research on driving style representation

  42. arXiv:2011.08835  [pdf, other

    cs.CL

    Toward Understanding Clinical Context of Medication Change Events in Clinical Narratives

    Authors: Diwakar Mahajan, Jennifer J Liang, Ching-Huei Tsou

    Abstract: Understanding medication events in clinical narratives is essential to achieving a complete picture of a patient's medication history. While prior research has explored classification of medication changes from clinical notes, studies to date have not considered the necessary clinical context needed for their use in real-world applications, such as medication timeline generation and medication rec… ▽ More

    Submitted 19 May, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

  43. arXiv:2011.05877  [pdf, other

    stat.ME cs.LG

    Split-Treatment Analysis to Rank Heterogeneous Causal Effects for Prospective Interventions

    Authors: Yanbo Xu, Divyat Mahajan, Liz Manrao, Amit Sharma, Emre Kiciman

    Abstract: For many kinds of interventions, such as a new advertisement, marketing intervention, or feature recommendation, it is important to target a specific subset of people for maximizing its benefits at minimum cost or potential harm. However, a key challenge is that no data is available about the effect of such a prospective intervention since it has not been deployed yet. In this work, we propose a s… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: To be published in WSDM

  44. Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

    Authors: Ramaravind Kommiya Mothilal, Divyat Mahajan, Chenhao Tan, Amit Sharma

    Abstract: Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their u… ▽ More

    Submitted 29 May, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

    Comments: 15 pages, 10 figures

  45. WNTRAC: AI Assisted Tracking of Non-pharmaceutical Interventions Implemented Worldwide for COVID-19

    Authors: Parthasarathy Suryanarayanan, Ching-Huei Tsou, Ananya Poddar, Diwakar Mahajan, Bharath Dandala, Piyush Madan, Anshul Agrawal, Charles Wachira, Osebe Mogaka Samuel, Osnat Bar-Shira, Clifton Kipchirchir, Sharon Okwako, William Ogallo, Fred Otieno, Timothy Nyota, Fiona Matu, Vesna Resende Barros, Daniel Shats, Oren Kagan, Sekou Remy, Oliver Bent, Pooja Guhan, Shilpa Mahatma, Aisha Walcott-Bryant, Divya Pathak , et al. (1 additional authors not shown)

    Abstract: The Coronavirus disease 2019 (COVID-19) global pandemic has transformed almost every facet of human society throughout the world. Against an emerging, highly transmissible disease with no definitive treatment or vaccine, governments worldwide have implemented non-pharmaceutical intervention (NPI) to slow the spread of the virus. Examples of such interventions include community actions (e.g. school… ▽ More

    Submitted 4 January, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

    Comments: Updated title (Artificial Intelligence => AI). Updated figures. Referenced the open-sourced code repository in Code Availability section. Updated figures in the Usage Notes section

  46. arXiv:2008.05700  [pdf, other

    cs.CV

    What leads to generalization of object proposals?

    Authors: Rui Wang, Dhruv Mahajan, Vignesh Ramanathan

    Abstract: Object proposal generation is often the first step in many detection models. It is lucrative to train a good proposal model, that generalizes to unseen classes. This could help scaling detection models to larger number of classes with fewer annotations. Motivated by this, we study how a detection model trained on a small set of source classes can provide proposals that generalize to unseen classes… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  47. arXiv:2006.16423  [pdf, other

    cs.LG cs.DC stat.ML

    Efficient Algorithms for Device Placement of DNN Graph Operators

    Authors: Jakub Tarnawski, Amar Phanishayee, Nikhil R. Devanur, Divya Mahajan, Fanny Nina Paravecino

    Abstract: Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific accelerators being offered as hardware accelerators in addition to CPUs. These trends necessitate distributing the workload across multiple devices. Recent work has… ▽ More

    Submitted 29 October, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS 2020

  48. arXiv:2006.07500  [pdf, other

    cs.LG cs.AI stat.ML

    Domain Generalization using Causal Matching

    Authors: Divyat Mahajan, Shruti Tople, Amit Sharma

    Abstract: In the domain generalization literature, a common objective is to learn representations independent of the domain after conditioning on the class label. We show that this objective is not sufficient: there exist counter-examples where a model fails to generalize to unseen domains even after satisfying class-conditional domain invariance. We formalize this observation through a structural causal mo… ▽ More

    Submitted 29 June, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Proceedings of the 38th International Conference on Machine Learning (ICML), PMLR 139, 2021. (Long Talk)

  49. arXiv:2005.10899  [pdf, other

    cs.CL cs.IR

    Extracting Daily Dosage from Medication Instructions in EHRs: An Automated Approach and Lessons Learned

    Authors: Diwakar Mahajan, Jennifer J. Liang, Ching-Huei Tsou

    Abstract: Medication timelines have been shown to be effective in helping physicians visualize complex patient medication information. A key feature in many such designs is a longitudinal representation of a medication's daily dosage and its changes over time. However, daily dosage as a discrete value is generally not provided and needs to be derived from free text instructions (Sig). Existing works in dail… ▽ More

    Submitted 28 October, 2021; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: 10 pages, 4 figures, 9 tables

  50. arXiv:2001.03152  [pdf, other

    cs.CV cs.LG

    Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

    Authors: Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram

    Abstract: Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. However, strongly relying on context risks a model's generalizability, especially when typical co-occurrence patterns are absent. This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations. Our goal is to accurately recognize a… ▽ More

    Submitted 5 May, 2020; v1 submitted 9 January, 2020; originally announced January 2020.

    Comments: CVPR 2020