Skip to main content

Showing 1–50 of 213 results for author: Arora, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.05483  [pdf, other

    cs.CL cs.LG

    Just read twice: closing the recall gap for recurrent language models

    Authors: Simran Arora, Aman Timalsina, Aaryan Singhal, Benjamin Spector, Sabri Eyuboglu, Xinyi Zhao, Ashish Rao, Atri Rudra, Christopher Ré

    Abstract: Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key chal… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  2. arXiv:2407.03931  [pdf, other

    eess.IV cs.CV

    LeDNet: Localization-enabled Deep Neural Network for Multi-Label Radiography Image Classification

    Authors: Lalit Pant, Shubham Arora

    Abstract: Multi-label radiography image classification has long been a topic of interest in neural networks research. In this paper, we intend to classify such images using convolution neural networks with novel localization techniques. We will use the chest x-ray images to detect thoracic diseases for this purpose. For accurate diagnosis, it is crucial to train the network with good quality images. But man… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 6 pages, 7 figures

  3. arXiv:2406.18521  [pdf, other

    cs.CL cs.CV

    CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

    Authors: Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

    Abstract: Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 121 pages, 90 figures

  4. arXiv:2406.17761  [pdf, other

    cs.CL cs.AI cs.LG

    CaLMQA: Exploring culturally specific long-form question answering across 23 languages

    Authors: Shane Arora, Marzena Karpinska, Hung-Ting Chen, Ipsita Bhattacharjee, Mohit Iyyer, Eunsol Choi

    Abstract: Large language models (LLMs) are used for long-form question answering (LFQA), which requires them to generate paragraph-length answers to complex questions. While LFQA has been well-studied in English, this research has not been extended to other languages. To bridge this gap, we introduce CaLMQA, a collection of 1.5K complex culturally specific questions spanning 23 languages and 51 culturally a… ▽ More

    Submitted 3 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 39 pages, 17 figures. Code and data available at https://github.com/2015aroras/CaLMQA. Revised argument in section 4, results unchanged

  5. arXiv:2406.16107  [pdf, ps, other

    eess.AS cs.CL

    Decoder-only Architecture for Streaming End-to-end Speech Recognition

    Authors: Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

    Abstract: Decoder-only language models (LMs) have been successfully adopted for speech-processing tasks including automatic speech recognition (ASR). The LMs have ample expressiveness and perform efficiently. This efficiency is a suitable characteristic for streaming applications of ASR. In this work, we propose to use a decoder-only architecture for blockwise streaming ASR. In our approach, speech features… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  6. arXiv:2406.12611  [pdf, other

    cs.SD cs.CL eess.AS

    Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting

    Authors: Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe

    Abstract: End-to-end multilingual speech recognition models handle multiple languages through a single model, often incorporating language identification to automatically detect the language of incoming speech. Since the common scenario is where the language is already known, these models can perform as language-specific by using language information as prompts, which is particularly beneficial for attentio… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  7. arXiv:2406.12317  [pdf, other

    cs.CL eess.AS

    Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

    Authors: Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe

    Abstract: Recently, multi-task spoken language understanding (SLU) models have emerged, designed to address various speech processing tasks. However, these models often rely on a large number of parameters. Also, they often encounter difficulties in adapting to new data for a specific task without experiencing catastrophic forgetting of previously trained tasks. In this study, we propose finding task-specif… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech2024

  8. arXiv:2406.10083  [pdf, other

    cs.CL cs.SD eess.AS

    On the Evaluation of Speech Foundation Models for Spoken Language Understanding

    Authors: Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Findings 2024

  9. arXiv:2406.05339  [pdf, other

    eess.AS cs.AI

    To what extent can ASV systems naturally defend against spoofing attacks?

    Authors: Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung

    Abstract: The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically ex… ▽ More

    Submitted 14 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 3 tables, Interspeech 2024

  10. arXiv:2406.00869  [pdf, other

    cs.RO

    Using 3-D LiDAR Data for Safe Physical Human-Robot Interaction

    Authors: Sarthak Arora, Karthik Subramanian, Odysseus Adamides, Ferat Sahin

    Abstract: This paper explores the use of 3D lidar in a physical Human-Robot Interaction (pHRI) scenario. To achieve the aforementioned, experiments were conducted to mimic a modern shop-floor environment. Data was collected from a pool of seventeen participants while performing pre-determined tasks in a shared workspace with the robot. To demonstrate an end-to-end case; a perception pipeline was developed t… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE-CASE 2024. Under Review

  11. arXiv:2405.12205  [pdf, other

    cs.AI cs.LG

    Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

    Authors: Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora

    Abstract: Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interac… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Preprint. Under review

  12. arXiv:2405.06787  [pdf, other

    quant-ph cs.CR

    A computational test of quantum contextuality, and even simpler proofs of quantumness

    Authors: Atul Singh Arora, Kishor Bharti, Alexandru Cojocaru, Andrea Coladangelo

    Abstract: Bell non-locality is a fundamental feature of quantum mechanics whereby measurements performed on "spatially separated" quantum systems can exhibit correlations that cannot be understood as revealing predetermined values. This is a special case of the more general phenomenon of "quantum contextuality", which says that such correlations can occur even when the measurements are not necessarily on se… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 69 pages, 6 figures. For updates see https://atulsingharora.github.io/PoC

  13. arXiv:2405.00201  [pdf, other

    cs.CL cs.AI

    SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models

    Authors: Samir Arora, Liangliang Wang

    Abstract: Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. However, the substantial requirements for computational power and storage have discouraged its widespread use. Moreover, increasing evidence of catastrophic forgetting and overparameterization in the Transformer architecture has motivated researchers to seek more effic… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  14. arXiv:2404.17079  [pdf, other

    quant-ph cs.CR

    Improving device-independent weak coin flipping protocols

    Authors: Atul Singh Arora, Jamie Sikora, Thomas Van Himbeeck

    Abstract: Weak coin flipping is the cryptographic task where Alice and Bob remotely flip a coin but want opposite outcomes. This work studies this task in the device-independent regime where Alice and Bob neither trust each other, nor their quantum devices. The best protocol was devised over a decade ago by Silman, Chailloux, Aharon, Kerenidis, Pironio, and Massar with bias $\varepsilon \approx 0.33664$, wh… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 25 pages, 7 figures

  15. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  16. arXiv:2403.09603  [pdf, other

    cs.CR cs.AI cs.LG

    Optimistic Verifiable Training by Controlling Hardware Nondeterminism

    Authors: Megha Srivastava, Simran Arora, Dan Boneh

    Abstract: The increasing compute demands of AI systems has led to the emergence of services that train models on behalf of clients lacking necessary resources. However, ensuring correctness of training and guarding against potential training-time attacks, such as data poisoning, poses challenges. Existing works on verifiable training largely fall into two classes: proof-based systems, which struggle to scal… ▽ More

    Submitted 16 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 5 figures, preprint

  17. arXiv:2403.00887  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

    Authors: Aron R, Indra Sigicharla, Chirag Periwal, Mohanaprasad K, Nithya Darisini P S, Sourabh Tiwari, Shivani Arora

    Abstract: The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare and retail experiences. Discerning emotions aids mental health, while age and gender detection are vi… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  18. arXiv:2402.18668  [pdf, other

    cs.CL cs.LG

    Simple linear attention language models balance the recall-throughput tradeoff

    Authors: Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré

    Abstract: Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is bottle-necked during inference by the KV-cache's aggressive memory consumption. In this work, we explore whether we can improve language model efficiency (e.g. by reducing memory consumption) without… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  19. arXiv:2402.18540  [pdf, other

    cs.LG cs.AI cs.CL

    Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

    Authors: Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora

    Abstract: Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research. These models underwent alignment training and were considered safe. Recently Qi et al. (2023) reported that even benign fine-tuning (e.g., on seemingly safe datasets) can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. Through extens… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 20 pages

  20. arXiv:2402.16021  [pdf, other

    cs.CL cs.AI cs.CV eess.AS

    TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

    Authors: Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro

    Abstract: The capability to jointly process multi-modal information is becoming an essential task. However, the limited number of paired multi-modal data and the large computational requirements in multi-modal learning hinder the development. We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text. We introduce a novel viewpoint, whe… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  21. arXiv:2402.15855  [pdf, other

    quant-ph cs.CR

    Protocols for Quantum Weak Coin Flipping

    Authors: Atul Singh Arora, Jérémie Roland, Chrysoula Vlachou, Stephan Weis

    Abstract: Weak coin flipping is an important cryptographic primitive -- it is the strongest known secure two-party computation primitive that classically becomes secure only under certain assumptions (e.g. computational hardness), while quantumly there exist protocols that achieve arbitrarily close to perfect security. This breakthrough result was established by Mochon in 2007 [arXiv:0711.4114]. However, hi… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 51 pages (+ 9 appendix), 12 figures. This is a self-contained, concise version of our main results in arXiv:1811.02984 (STOC '19) and arXiv:1911.13283v2 (SODA '21). The Cryptology ePrint 2022/1101 is the comprehensive version, subsuming the above

  22. arXiv:2402.11111  [pdf, other

    cs.CL

    Language Models as Science Tutors

    Authors: Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodríguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang, Zirui Wang, Xindi Wu, Mengzhou Xia, Wenhan Jia, Jiatong Yu, Jun-Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen

    Abstract: NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TutorEval and TutorChat. TutorEval is a diverse question-answering bench… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 8 pages without bibliography and appendix, 26 pages total

  23. arXiv:2402.07440  [pdf, other

    cs.IR cs.LG

    Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

    Authors: Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher Ré

    Abstract: Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval perform… ▽ More

    Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  24. arXiv:2402.04333  [pdf, other

    cs.CL cs.AI cs.LG

    LESS: Selecting Influential Data for Targeted Instruction Tuning

    Authors: Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen

    Abstract: Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills (e.g., reasoning). The challenge lies in identifying the most relevant data from these extensive datasets to effectively develop specific capabilities, a setting we… ▽ More

    Submitted 12 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ICML 2024; Code and data are available at https://github.com/princeton-nlp/LESS

  25. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  26. arXiv:2401.16658  [pdf, ps, other

    cs.CL eess.AS

    OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

    Authors: Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe

    Abstract: Recent studies have highlighted the importance of fully open foundation models. The Open Whisper-style Speech Model (OWSM) is an initial step towards reproducing OpenAI Whisper using public data and open-source toolkits. However, previous versions of OWSM (v1 to v3) are still based on standard Transformer, which might lead to inferior performance compared to state-of-the-art speech encoder archite… ▽ More

    Submitted 16 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at INTERSPEECH 2024. Webpage: https://www.wavlab.org/activities/2024/owsm/

  27. arXiv:2401.08520  [pdf, other

    cs.CR cs.CE

    SecPLF: Secure Protocols for Loanable Funds against Oracle Manipulation Attacks

    Authors: Sanidhay Arora, Yingjiu Li, Yebo Feng, Jiahua Xu

    Abstract: The evolving landscape of Decentralized Finance (DeFi) has raised critical security concerns, especially pertaining to Protocols for Loanable Funds (PLFs) and their dependency on price oracles, which are susceptible to manipulation. The emergence of flash loans has further amplified these risks, enabling increasingly complex oracle manipulation attacks that can lead to significant financial losses… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  28. arXiv:2401.00353  [pdf, other

    cs.IR

    EXPLORE -- Explainable Song Recommendation

    Authors: Abhinav Arun, Mehul Soni, Palash Choudhary, Saksham Arora

    Abstract: This study explores the development of an explainable music recommendation system with enhanced user control. Leveraging a hybrid of collaborative filtering and content-based filtering, we address the challenges of opaque recommendation logic and lack of user influence on results. We present a novel approach combining advanced algorithms and an interactive user interface. Our methodology integrate… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: 6 pages, 7 figures

  29. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  30. arXiv:2312.09582  [pdf, other

    cs.CL cs.SD eess.AS

    Phoneme-aware Encoding for Prefix-tree-based Contextual ASR

    Authors: Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe

    Abstract: In speech recognition applications, it is important to recognize context-specific rare words, such as proper nouns. Tree-constrained Pointer Generator (TCPGen) has shown promise for this purpose, which efficiently biases such words with a prefix tree. While the original TCPGen relies on grapheme-based encoding, we propose extending it with phoneme-aware encoding to better recognize words of unusua… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP2024

  31. arXiv:2312.04927  [pdf, other

    cs.CL cs.LG

    Zoology: Measuring and Improving Recall in Efficient Language Models

    Authors: Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré

    Abstract: Attention-free language models that combine gating and convolutions are growing in popularity due to their efficiency and increasingly competitive performance. To better understand these architectures, we pretrain a suite of 17 attention and "gated-convolution" language models, finding that SoTA gated-convolution architectures still underperform attention by up to 2.1 perplexity points on the Pile… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  32. RELIC: Investigating Large Language Model Responses using Self-Consistency

    Authors: Furui Cheng, Vilém Zouhar, Simran Arora, Mrinmaya Sachan, Hendrik Strobelt, Mennatallah El-Assady

    Abstract: Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To address this challenge, we propose an interactive system that helps users gain insight into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence… ▽ More

    Submitted 4 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  33. arXiv:2311.15268  [pdf, other

    cs.LG cs.AI

    Unlearning via Sparse Representations

    Authors: Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

    Abstract: Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's p… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  34. arXiv:2310.17567  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

    Authors: Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora

    Abstract: With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023). This… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  35. arXiv:2310.14423  [pdf, other

    cs.LG

    A Quadratic Synchronization Rule for Distributed Deep Learning

    Authors: Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

    Abstract: In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models. Local gradient methods, such as Local SGD, address this issue by allowing workers to compute locally for $H$ steps without synchronizing with others, hence reducing communication frequency. While… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: camera-ready version for ICLR'24

  36. arXiv:2310.12150  [pdf, other

    cs.CL

    Understanding Retrieval Augmentation for Long-Form Question Answering

    Authors: Hung-Ting Chen, Fangyuan Xu, Shane Arora, Eunsol Choi

    Abstract: We present a study of retrieval-augmented language models (LMs) on long-form question answering. We analyze how retrieval augmentation impacts different LMs, by comparing answers generated from models while using the same evidence documents, and how differing quality of retrieval document set impacts the answers generated from the same LM. We study various attributes of generated answers (e.g., fl… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  37. arXiv:2310.12109  [pdf, other

    cs.LG

    Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

    Authors: Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré

    Abstract: Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 (Oral)

  38. arXiv:2310.08507  [pdf, other

    cs.SE

    Yuga: Automatically Detecting Lifetime Annotation Bugs in the Rust Language

    Authors: Vikram Nitin, Anne Mulhern, Sanjay Arora, Baishakhi Ray

    Abstract: The Rust programming language is becoming increasingly popular among systems programmers due to its efficient performance and robust memory safety guarantees. Rust employs an ownership model to ensure this guarantee by allowing each value to be owned by only one identifier at a time. Additionally, it introduces the concept of borrowing and lifetimes to enable other variables to borrow the values u… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  39. arXiv:2310.04546  [pdf, other

    cs.CR

    Privacy-Preserving Financial Anomaly Detection via Federated Learning & Multi-Party Computation

    Authors: Sunpreet Arora, Andrew Beams, Panagiotis Chatzigiannis, Sebastian Meiser, Karan Patel, Srinivasan Raghuraman, Peter Rindal, Harshal Shah, Yizhen Wang, Yuhang Wu, Hao Yang, Mahdi Zamani

    Abstract: One of the main goals of financial institutions (FIs) today is combating fraud and financial crime. To this end, FIs use sophisticated machine-learning models trained using data collected from their customers. The output of machine learning models may be manually reviewed for critical use cases, e.g., determining the likelihood of a transaction being anomalous and the subsequent course of action.… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: 12 pages

  40. arXiv:2310.03285  [pdf, other

    cs.LG cs.CR

    Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations

    Authors: Ahmed Abusnaina, Yizhen Wang, Sunpreet Arora, Ke Wang, Mihai Christodorescu, David Mohaisen

    Abstract: Toward robust malware detection, we explore the attack surface of existing malware detection systems. We conduct root-cause analyses of the practical binary-level black-box adversarial malware examples. Additionally, we uncover the sensitivity of volatile features within the detection engines and exhibit their exploitability. Highlighting volatile information channels within the software, we intro… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 12 pages

  41. arXiv:2310.02973  [pdf, other

    cs.CL cs.SD eess.AS

    UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

    Authors: Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe

    Abstract: Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models. Motivated by this, we ask: can we build a single model that jointly performs various spoken language understanding (SLU) tasks? We start by adapting a pre-trained automatic speech recognition model to additio… ▽ More

    Submitted 3 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted at NAACL 2024

  42. arXiv:2309.13876  [pdf, other

    cs.CL cs.SD eess.AS

    Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

    Authors: Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

    Abstract: Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised speech data. It generalizes well to various speech recognition and translation benchmarks even in a zero-shot setup. However, the full pipeline for developing such models (from data collection to training) is not publicly accessib… ▽ More

    Submitted 24 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted at ASRU 2023

  43. arXiv:2309.10926  [pdf, other

    cs.CL cs.SD eess.AS

    Semi-Autoregressive Streaming ASR With Label Context

    Authors: Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury

    Abstract: Non-autoregressive (NAR) modeling has gained significant interest in speech processing since these models achieve dramatically lower inference time than autoregressive (AR) models while also achieving good transcription accuracy. Since NAR automatic speech recognition (ASR) models must wait for the completion of the entire utterance before processing, some works explore streaming NAR models based… ▽ More

    Submitted 20 February, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  44. arXiv:2309.09510  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

    Authors: Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee

    Abstract: Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies in speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, a benchmark designed for bui… ▽ More

    Submitted 22 March, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: To appear in the proceedings of ICASSP 2024

  45. arXiv:2309.08876  [pdf, ps, other

    eess.AS cs.SD

    Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

    Authors: Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

    Abstract: Collecting audio-text pairs is expensive; however, it is much easier to access text-only data. Unless using shallow fusion, end-to-end automatic speech recognition (ASR) models require architecture modifications or additional training schemes to use text-only data. Inspired by recent advances in decoder-only language models (LMs), such as GPT-3 and PaLM adopted for speech-processing tasks, we prop… ▽ More

    Submitted 9 January, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

  46. arXiv:2309.04664  [pdf, other

    cs.CR cs.LG

    Compact: Approximating Complex Activation Functions for Secure Computation

    Authors: Mazharul Islam, Sunpreet S. Arora, Rahul Chatterjee, Peter Rindal, Maliheh Shirvanian

    Abstract: Secure multi-party computation (MPC) techniques can be used to provide data privacy when users query deep neural network (DNN) models hosted on a public cloud. State-of-the-art MPC techniques can be directly leveraged for DNN models that use simple activation functions such as ReLU. However, these techniques are ineffective and/or inefficient for the complex and highly non-linear activation functi… ▽ More

    Submitted 17 March, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: Accepted to Proceedings on Privacy Enhancing Technologies (PoPETs)

  47. arXiv:2308.10327  [pdf, other

    quant-ph cs.LG physics.comp-ph physics.data-an

    Quantum State Tomography using Quantum Machine Learning

    Authors: Nouhaila Innan, Owais Ishtiaq Siddiqui, Shivang Arora, Tamojit Ghosh, Yasemin Poyraz Koçak, Dominic Paragas, Abdullah Al Omar Galib, Muhammad Al-Zafar Khan, Mohamed Bennai

    Abstract: Quantum State Tomography (QST) is a fundamental technique in Quantum Information Processing (QIP) for reconstructing unknown quantum states. However, the conventional QST methods are limited by the number of measurements required, which makes them impractical for large-scale quantum systems. To overcome this challenge, we propose the integration of Quantum Machine Learning (QML) techniques to enha… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 18 pages, 19 figures

    Journal ref: Quantum Mach. Intell. 6, 28 (2024)

  48. arXiv:2307.15936  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    A Theory for Emergence of Complex Skills in Language Models

    Authors: Sanjeev Arora, Anirudh Goyal

    Abstract: A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws… ▽ More

    Submitted 5 November, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

  49. arXiv:2307.12767  [pdf, ps, other

    eess.AS cs.SD

    Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

    Authors: Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

    Abstract: Although frame-based models, such as CTC and transducers, have an affinity for streaming automatic speech recognition, their decoding uses no future knowledge, which could lead to incorrect pruning. Conversely, label-based attention encoder-decoder mitigates this issue using soft attention to the input, while it tends to overestimate labels biased towards its training domain, unlike CTC. We exploi… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted for Interspeech 2023

  50. arXiv:2307.11005  [pdf, other

    cs.CL cs.SD eess.AS

    Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

    Authors: Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe

    Abstract: There has been an increased interest in the integration of pretrained speech recognition (ASR) and language models (LM) into the SLU framework. However, prior methods often struggle with a vocabulary mismatch between pretrained models, and LM cannot be directly utilized as they diverge from its NLU formulation. In this study, we propose a three-pass end-to-end (E2E) SLU system that effectively int… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted at INTERSPEECH 2023