Zum Hauptinhalt springen

Showing 1–50 of 156 results for author: Mehta, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.06246  [pdf, other

    cs.RO

    Stable-BC: Controlling Covariate Shift with Stable Behavior Cloning

    Authors: Shaunak A. Mehta, Yusuf Umut Ciftci, Balamurugan Ramachandran, Somil Bansal, Dylan P. Losey

    Abstract: Behavior cloning is a common imitation learning paradigm. Under behavior cloning the robot collects expert demonstrations, and then trains a policy to match the actions taken by the expert. This works well when the robot learner visits states where the expert has already demonstrated the correct action; but inevitably the robot will also encounter new states outside of its training dataset. If the… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  2. arXiv:2407.16875  [pdf, other

    cs.CV

    PathwayBench: Assessing Routability of Pedestrian Pathway Networks Inferred from Multi-City Imagery

    Authors: Yuxiang Zhang, Bill Howe, Sachin Mehta, Nicholas-J Bolten, Anat Caspi

    Abstract: Applications to support pedestrian mobility in urban areas require a complete, and routable graph representation of the built environment. Globally available information, including aerial imagery provides a scalable source for constructing these path networks, but the associated learning problem is challenging: Relative to road network pathways, pedestrian network pathways are narrower, more frequ… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2303.02323

  3. arXiv:2407.14057  [pdf, other

    cs.CL cs.AI cs.LG

    LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

    Authors: Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi

    Abstract: The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens. For long prompts, the KV cache must be computed for all tokens during the prefilling stage, which can significantly increase the time needed to generate the first tok… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  4. arXiv:2407.06125  [pdf, other

    cs.HC cs.AI

    Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

    Authors: Avinash Anand, Chayan Tank, Sarthak Pol, Vinayak Katoch, Shaina Mehta, Rajiv Ratn Shah

    Abstract: Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to suicide. Generally, Diagnosing depression or any other mental disorder involves conducting semi-structured interviews alongside supplementary questionna… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 12 pages, 9 figures, 9 tables

  5. arXiv:2406.14290  [pdf, ps, other

    cs.CY cs.SI

    Examining the Implications of Deepfakes for Election Integrity

    Authors: Hriday Ranka, Mokshit Surana, Neel Kothari, Veer Pariawala, Pratyay Banerjee, Aditya Surve, Sainath Reddy Sankepally, Raghav Jain, Jhagrut Lalwani, Swapneel Mehta

    Abstract: It is becoming cheaper to launch disinformation operations at scale using AI-generated content, in particular 'deepfake' technology. We have observed instances of deepfakes in political campaigns, where generated content is employed to both bolster the credibility of certain narratives (reinforcing outcomes) and manipulate public perception to the detriment of targeted candidates or causes (advers… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at the AAAI 2024 conference, AI for Credible Elections Workshop-AI4CE 2024

  6. How Decentralization Affects User Agency on Social Platforms

    Authors: Aditya Surve, Aneesh Shamraj, Swapneel Mehta

    Abstract: Mainstream social media platforms function as "walled garden" ecosystems that restrict user agency, control, and data portability. They have demonstrated a lack of transparency that contributes to a multitude of online harms. Our research investigates how decentralization might present promise as an alternative model to walled garden platforms. Specifically, we describe the user-driven content mod… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Journal ref: ICWSM DATA CHALLENGE 2024

  7. arXiv:2406.05401  [pdf, other

    eess.AS cs.HC cs.SD

    Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech

    Authors: Shivam Mehta, Harm Lameris, Rajiv Punmiya, Jonas Beskow, Éva Székely, Gustav Eje Henter

    Abstract: Converting input symbols to output audio in TTS requires modelling the durations of speech sounds. Leading non-autoregressive (NAR) TTS models treat duration modelling as a regression problem. The same utterance is then spoken with identical timings every time, unlike when a human speaks. Probabilistic models of duration have been proposed, but there is mixed evidence of their benefits. However, p… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures. Final version, accepted to Interspeech 2024

    MSC Class: 68T07 ACM Class: I.2.7; I.2.6; H.5.5

  8. arXiv:2406.03354  [pdf, other

    cs.SI

    Can Social Media Platforms Transcend Political Labels? An Analysis of Neutral Conservations on Truth Social

    Authors: Chaitya Shah, Ritesh Konka, Gautam Malpani, Swapneel Mehta, Lynnette Hui Xian Ng

    Abstract: There is a prevailing perception that content on a social media platform generally have the same political leaning. These platforms are often viewed as ideologically congruent entities, reflecting the majority opinion of their users; a prime example of this is Truth Social. While this perception may exist, it is essential to verify the platform's credibility, acknowledging that such platforms cont… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 1 table

  9. arXiv:2405.04691  [pdf, other

    cs.CR cs.LG

    Carbon Filter: Real-time Alert Triage Using Large Scale Clustering and Fast Search

    Authors: Jonathan Oliver, Raghav Batta, Adam Bates, Muhammad Adil Inam, Shelly Mehta, Shugao Xia

    Abstract: "Alert fatigue" is one of the biggest challenges faced by the Security Operations Center (SOC) today, with analysts spending more than half of their time reviewing false alerts. Endpoint detection products raise alerts by pattern matching on event telemetry against behavioral rules that describe potentially malicious behavior, but can suffer from high false positives that distract from actual atta… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  10. arXiv:2405.03674  [pdf, other

    cs.HC

    Anti-Heroes: An Ethics-focused Method for Responsible Designer Intentions

    Authors: Shikha Mehta, Shruthi Sai Chivukula, Colin M. Gray, Ritika Gairola

    Abstract: HCI and design researchers have designed, adopted, and customized a range of ethics-focused methods to inscribe values and support ethical decision making in a design process. In this work-in-progress, we add to this body of resources, constructing a method that surfaces the designer's intentions in an action-focused way, encouraging consideration of both manipulative and value-centered roles. Ant… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  11. arXiv:2404.19622  [pdf, other

    cs.HC cs.CV cs.GR cs.SD eess.AS

    Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

    Authors: Shivam Mehta, Anna Deichler, Jim O'Regan, Birger Moëll, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson

    Abstract: Although humans engaged in face-to-face conversation simultaneously communicate both verbally and non-verbally, methods for joint and unified synthesis of speech audio and co-speech 3D gesture motion from text are a new and emerging field. These technologies hold great promise for more human-like, efficient, expressive, and robust synthetic communication, but are currently held back by the lack of… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 13+1 pages, 2 figures, accepted at the Human Motion Generation workshop (HuMoGen) at CVPR 2024

    MSC Class: 68T07 (Primary); 68T42 (Secondary) ACM Class: I.2.7; I.2.6; H.5

  12. arXiv:2404.15653  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

    Authors: Sachin Mehta, Maxwell Horton, Fartash Faghri, Mohammad Hossein Sekhavat, Mahyar Najibi, Mehrdad Farajtabar, Oncel Tuzel, Mohammad Rastegari

    Abstract: Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between image and text pairs poses computational challenges. This paper presents a novel weakly supervised pre-training of vision models on web-scale image-text data. The proposed m… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  13. arXiv:2404.14619  [pdf, other

    cs.CL cs.AI cs.LG

    OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

    Authors: Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari

    Abstract: The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of… ▽ More

    Submitted 1 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Minor corrections

  14. arXiv:2404.13765  [pdf, other

    cs.HC

    SciDaSynth: Interactive Structured Knowledge Extraction and Synthesis from Scientific Literature with Large Language Model

    Authors: Xingbo Wang, Samantha L. Huey, Rui Sheng, Saurabh Mehta, Fei Wang

    Abstract: Extraction and synthesis of structured knowledge from extensive scientific literature are crucial for advancing and disseminating scientific progress. Although many existing systems facilitate literature review and digest, they struggle to process multimodal, varied, and inconsistent information within and across the literature into structured data. We introduce SciDaSynth, a novel interactive sys… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 15 pages, 7 figures

  15. arXiv:2404.13755  [pdf, other

    cs.RO

    Combining and Decoupling Rigid and Soft Grippers to Enhance Robotic Manipulation

    Authors: Maya Keely, Yeunhee Kim, Shaunak A. Mehta, Joshua Hoegerman, Robert Ramirez Sanchez, Emily Paul, Camryn Mills, Dylan P. Losey, Michael D. Bartlett

    Abstract: For robot arms to perform everyday tasks in unstructured environments, these robots must be able to manipulate a diverse range of objects. Today's robots often grasp objects with either soft grippers or rigid end-effectors. However, purely rigid or purely soft grippers have fundamental limitations: soft grippers struggle with irregular, heavy objects, while rigid grippers often cannot grasp small,… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  16. arXiv:2404.04520  [pdf, other

    cs.CL cs.AI cs.LG

    IITK at SemEval-2024 Task 4: Hierarchical Embeddings for Detection of Persuasion Techniques in Memes

    Authors: Shreenaga Chikoti, Shrey Mehta, Ashutosh Modi

    Abstract: Memes are one of the most popular types of content used in an online disinformation campaign. They are primarily effective on social media platforms since they can easily reach many users. Memes in a disinformation campaign achieve their goal of influencing the users through several rhetorical and psychological techniques, such as causal oversimplification, name-calling, and smear. The SemEval 202… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at SemEval 2024, NAACL 2024; 9 pages

  17. arXiv:2403.13281  [pdf, other

    cs.RO

    Waypoint-Based Reinforcement Learning for Robot Manipulation Tasks

    Authors: Shaunak A. Mehta, Soheil Habibian, Dylan P. Losey

    Abstract: Robot arms should be able to learn new tasks. One framework here is reinforcement learning, where the robot is given a reward function that encodes the task, and the robot autonomously learns actions to maximize its reward. Existing approaches to reinforcement learning often frame this problem as a Markov decision process, and learn a policy (or a hierarchy of policies) to complete the task. These… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  18. arXiv:2403.09806  [pdf, other

    cs.AI

    xLP: Explainable Link Prediction for Master Data Management

    Authors: Balaji Ganesan, Matheen Ahmed Pasha, Srinivasa Parkala, Neeraj R Singh, Gayatri Mishra, Sumit Bhatia, Hima Patel, Somashekar Naganna, Sameep Mehta

    Abstract: Explaining neural model predictions to users requires creativity. Especially in enterprise applications, where there are costs associated with users' time, and their trust in the model predictions is critical for adoption. For link prediction in master data management, we have built a number of explainability solutions drawing from research in interpretability, fact verification, path ranking, neu… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures, NeurIPS 2020 Competition and Demonstration Track. arXiv admin note: text overlap with arXiv:2012.05516

  19. arXiv:2403.06009  [pdf, other

    cs.LG

    Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

    Authors: Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor, Elizabeth M. Daly, Kirushikesh DB, Rogério Abreu de Paula, Pierre Dognin, Eitan Farchi, Soumya Ghosh, Michael Hind, Raya Horesh, George Kour, Ja Young Lee, Nishtha Madaan, Sameep Mehta, Erik Miehling, Keerthiram Murugesan, Manish Nagireddy , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we presen… ▽ More

    Submitted 19 August, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  20. arXiv:2403.00826  [pdf, other

    cs.CL cs.CR cs.LG

    LLMGuard: Guarding Against Unsafe LLM Behavior

    Authors: Shubh Goyal, Medha Hira, Shubham Mishra, Sukriti Goyal, Arnav Goel, Niharika Dadu, Kirushikesh DB, Sameep Mehta, Nishtha Madaan

    Abstract: Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content aga… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

    Comments: accepted in demonstration track of AAAI-24

  21. arXiv:2402.07281  [pdf, other

    cs.LG

    Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study

    Authors: Santonu Sarkar, Shanay Mehta, Nicole Fernandes, Jyotirmoy Sarkar, Snehanshu Saha

    Abstract: Detection of anomalous situations for complex mission-critical systems holds paramount importance when their service continuity needs to be ensured. A major challenge in detecting anomalies from the operational data arises due to the imbalanced class distribution problem since the anomalies are supposed to be rare events. This paper evaluates a diverse array of machine learning-based anomaly detec… ▽ More

    Submitted 25 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  22. arXiv:2402.05983  [pdf, other

    eess.IV cs.LG physics.app-ph physics.ins-det

    Capability enhancement of the X-ray micro-tomography system via ML-assisted approaches

    Authors: Dhruvi Shah, Shruti Mehta, Ashish Agrawal, Shishir Purohit, Bhaskar Chaudhury

    Abstract: Ring artifacts in X-ray micro-CT images are one of the primary causes of concern in their accurate visual interpretation and quantitative analysis. The geometry of X-ray micro-CT scanners is similar to the medical CT machines, except the sample is rotated with a stationary source and detector. The ring artifacts are caused by a defect or non-linear responses in detector pixels during the MicroCT d… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  23. arXiv:2312.09299  [pdf, other

    cs.LG cs.CL cs.CV

    Weight subcloning: direct initialization of transformers using larger pretrained ones

    Authors: Mohammad Samragh, Mehrdad Farajtabar, Sachin Mehta, Raviteja Vemulapalli, Fartash Faghri, Devang Naik, Oncel Tuzel, Mohammad Rastegari

    Abstract: Training large transformer models from scratch for a target task requires lots of data and is computationally demanding. The usual practice of transfer learning overcomes this challenge by initializing the model with weights of a pretrained model of the same size and specification to increase the convergence and training speed. However, what if no pretrained model of the required size is available… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  24. arXiv:2311.18237  [pdf, other

    cs.CV cs.LG

    Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

    Authors: Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri, Sachin Mehta, Mehrdad Farajtabar, Mohammad Rastegari, Oncel Tuzel

    Abstract: Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due to their high inference compute cost, these models cannot be deployed for many real-world applications. Motivated by this, we ask the following important question, "How can we leverage the knowledge from a large VFM to… ▽ More

    Submitted 1 July, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: International Conference on Machine Learning, 2024

  25. arXiv:2311.14712  [pdf, ps, other

    cs.SI

    Multiagent Simulators for Social Networks

    Authors: Aditya Surve, Archit Rathod, Mokshit Surana, Gautam Malpani, Aneesh Shamraj, Sainath Reddy Sankepally, Raghav Jain, Swapneel S Mehta

    Abstract: Multiagent social network simulations are an avenue that can bridge the communication gap between the public and private platforms in order to develop solutions to a complex array of issues relating to online safety. While there are significant challenges relating to the scale of multiagent simulations, efficient learning from observational and interventional data to accurately model micro and mac… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  26. arXiv:2310.16303  [pdf, other

    cs.CL cs.IR

    URL-BERT: Training Webpage Representations via Social Media Engagements

    Authors: Ayesha Qamar, Chetan Verma, Ahmed El-Kishky, Sumit Binnani, Sneha Mehta, Taylor Berg-Kirkpatrick

    Abstract: Understanding and representing webpages is crucial to online social networks where users may share and engage with URLs. Common language model (LM) encoders such as BERT can be used to understand and represent the textual content of webpages. However, these representations may not model thematic information of web domains and URLs or accurately capture their appeal to social media users. In this w… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  27. arXiv:2310.16226  [pdf, other

    cs.CV cs.CL cs.LG

    TiC-CLIP: Continual Training of CLIP Models

    Authors: Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri

    Abstract: Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language mode… ▽ More

    Submitted 21 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  28. arXiv:2310.15308  [pdf, other

    cs.CV cs.LG

    SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

    Authors: Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari

    Abstract: The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficient… ▽ More

    Submitted 10 June, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  29. arXiv:2310.14108  [pdf, other

    cs.LG cs.AI cs.CV

    CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

    Authors: Mohammadreza Salehi, Mehrdad Farajtabar, Maxwell Horton, Fartash Faghri, Hadi Pouransari, Raviteja Vemulapalli, Oncel Tuzel, Ali Farhadi, Mohammad Rastegari, Sachin Mehta

    Abstract: Contrastive language image pretraining (CLIP) is a standard method for training vision-language models. While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities. This paper studies the following question: Can we augment CLIP training with task-specific vision models from model zoos to improve its visual represent… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  30. arXiv:2310.12126  [pdf, other

    cs.LG cs.AI cs.CL

    SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks

    Authors: Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, Hannaneh Hajishirzi

    Abstract: We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  31. arXiv:2310.05674  [pdf, other

    cs.LG cs.AI

    Making Scalable Meta Learning Practical

    Authors: Sang Keun Choe, Sanket Vaibhav Mehta, Hwijeen Ahn, Willie Neiswanger, Pengtao Xie, Emma Strubell, Eric Xing

    Abstract: Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which co… ▽ More

    Submitted 23 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

  32. arXiv:2310.05181  [pdf, other

    eess.AS cs.GR cs.HC cs.LG cs.SD

    Unified speech and gesture synthesis using flow matching

    Authors: Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

    Abstract: As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures. This paper presents a novel, unified architecture for jointly synthesising speech acoustics and skeleton-based 3D gesture motion from text, trained using optima… ▽ More

    Submitted 9 January, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: 5 pages, 1 figure. Final version, accepted to IEEE ICASSP 2024

    MSC Class: 68T07 (Primary); 68T42 (Secondary) ACM Class: I.2.7; I.2.6; H.5

  33. arXiv:2310.04564  [pdf, other

    cs.LG cs.AI

    ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

    Authors: Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar

    Abstract: Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications. However, their demanding computation during inference has raised significant challenges for deployment on resource-constrained devices. Despite recent trends favoring alternative activation functions such as GELU or SiLU, known for increased computation, this study strongly advocates for reinstat… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: preprint

  34. arXiv:2310.03937  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Diffusion Models as Masked Audio-Video Learners

    Authors: Elvis Nunez, Yanzi Jin, Mohammad Rastegari, Sachin Mehta, Maxwell Horton

    Abstract: Over the past several years, the synchronization between audio and visual signals has been leveraged to learn richer audio-visual representations. Aided by the large availability of unlabeled videos, many unsupervised training frameworks have demonstrated impressive results in various downstream audio and video tasks. Recently, Masked Audio-Video Learners (MAViL) has emerged as a state-of-the-art… ▽ More

    Submitted 4 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Camera-ready version for the Machine Learning for Audio Workshop at NeurIPS 2023

  35. arXiv:2309.05455  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

    Authors: Anna Deichler, Shivam Mehta, Simon Alexanderson, Jonas Beskow

    Abstract: This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these mod… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    MSC Class: 68T42 ACM Class: I.2.6; I.2.7

  36. arXiv:2309.04502  [pdf, other

    cs.CV

    On the Efficacy of Multi-scale Data Samplers for Vision Applications

    Authors: Elvis Nunez, Thomas Merth, Anish Prabhu, Mehrdad Farajtabar, Mohammad Rastegari, Sachin Mehta, Maxwell Horton

    Abstract: Multi-scale resolution training has seen an increased adoption across multiple vision tasks, including classification and detection. Training with smaller resolutions enables faster training at the expense of a drop in accuracy. Conversely, training with larger resolutions has been shown to improve performance, but memory constraints often make this infeasible. In this paper, we empirically study… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  37. arXiv:2309.03199  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    Matcha-TTS: A fast TTS architecture with conditional flow matching

    Authors: Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter

    Abstract: We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM). This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching. Careful design choices additionally ensure each synthesis step is fast to run. The method is probabilistic… ▽ More

    Submitted 9 January, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures. Final version, accepted to IEEE ICASSP 2024

    MSC Class: 68T07 ACM Class: I.2.7; I.2.6; H.5.5

  38. arXiv:2308.09863  [pdf, other

    cs.RO

    StROL: Stabilized and Robust Online Learning from Humans

    Authors: Shaunak A. Mehta, Forrest Meng, Andrea Bajcsy, Dylan P. Losey

    Abstract: Robots often need to learn the human's reward function online, during the current interaction. This real-time learning requires fast but approximate learning rules: when the human's behavior is noisy or suboptimal, current approximations can result in unstable robot learning. Accordingly, in this paper we seek to enhance the robustness and convergence properties of gradient descent learning rules… ▽ More

    Submitted 4 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

  39. arXiv:2308.07973  [pdf, other

    cs.CL

    "Beware of deception": Detecting Half-Truth and Debunking it through Controlled Claim Editing

    Authors: Sandeep Singamsetty, Nishtha Madaan, Sameep Mehta, Varad Bhatnagar, Pushpak Bhattacharyya

    Abstract: The prevalence of half-truths, which are statements containing some truth but that are ultimately deceptive, has risen with the increasing use of the internet. To help combat this problem, we have created a comprehensive pipeline consisting of a half-truth detection model and a claim editing model. Our approach utilizes the T5 model for controlled claim editing; "controlled" here means precise adj… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  40. arXiv:2307.07948  [pdf, ps, other

    eess.AS cs.CL

    Model Adaptation for ASR in low-resource Indian Languages

    Authors: Abhayjeet Singh, Arjun Singh Mehta, Ashish Khuraishi K S, Deekshitha G, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary, Karthika P, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Savitha, Prasanta Kumar Ghosh, Prashanthi V, Priyanka Pai, Raoul Nanavati, Rohan Saxena, Sai Praneeth Reddy Mora, Srinivasa Raghavan

    Abstract: Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models such as wav2vec2 and large-scale multi-lingual training like Whisper. A huge challenge still exists for low-resource languages where the availability of both audio and text is limited. This is further complicated by the presence of multiple… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: ASRU Special session overview paper

  41. arXiv:2307.04245  [pdf, other

    cs.CV cs.AI

    A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing

    Authors: Aishik Rakshit, Samyak Mehta, Anirban Dasgupta

    Abstract: Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The state-of-the-art methods work well with the OCR with printed text on license plates, shop names, etc. However, applications such as printed textbooks and handwritt… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: Accepted in IEEE GCON (IEEE Guwahati Subsection Conference) 2023

  42. arXiv:2306.09417  [pdf, other

    eess.AS cs.AI cs.CV cs.HC cs.LG

    Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

    Authors: Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

    Abstract: With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here, co-speech gestures). Only recently has research begun to explore the benefits of jointly synthesising these two modalities in a single system. The previous stat… ▽ More

    Submitted 9 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 7 pages, 2 figures, presented at the ISCA Speech Synthesis Workshop (SSW) 2023

    MSC Class: 68T07 (Primary); 68T42 (Secondary) ACM Class: I.2.7; I.2.6; G.3; H.5.5

  43. CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation

    Authors: Rahul Madhavan, Rishabh Garg, Kahini Wadhawan, Sameep Mehta

    Abstract: We propose a method to control the attributes of Language Models (LMs) for the text generation task using Causal Average Treatment Effect (ATE) scores and counterfactual augmentation. We explore this method, in the context of LM detoxification, and propose the Causally Fair Language (CFL) architecture for detoxifying pre-trained LMs in a plug-and-play manner. Our architecture is based on a Structu… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 19 pages, 10 figures. Findings of ACL 2023

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2023

  44. arXiv:2306.00238  [pdf, other

    cs.CV

    Bytes Are All You Need: Transformers Operating Directly On File Bytes

    Authors: Maxwell Horton, Sachin Mehta, Ali Farhadi, Mohammad Rastegari

    Abstract: Modern deep learning approaches usually utilize modality-specific processing. For example, the most common deep learning approach to image classification involves decoding image file bytes into an RGB tensor which is passed into a neural network. Instead, we investigate modality-independent representation learning by performing classification directly on file bytes, without the need for decoding f… ▽ More

    Submitted 1 July, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

    Journal ref: Transactions on Machine Learning Research 2835-8856 (2024)

  45. arXiv:2305.16443  [pdf, other

    cs.CV

    Human-Machine Comparison for Cross-Race Face Verification: Race Bias at the Upper Limits of Performance?

    Authors: Geraldine Jeckeln, Selin Yavuzcan, Kate A. Marquis, Prajay Sandipkumar Mehta, Amy N. Yates, P. Jonathon Phillips, Alice J. O'Toole

    Abstract: Face recognition algorithms perform more accurately than humans in some cases, though humans and machines both show race-based accuracy differences. As algorithms continue to improve, it is important to continually assess their race bias relative to humans. We constructed a challenging test of 'cross-race' face verification and used it to compare humans and two state-of-the-art face recognition sy… ▽ More

    Submitted 30 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 8 pages, 6 figures

  46. arXiv:2305.00131  [pdf, other

    cs.CV

    Regularizing Self-training for Unsupervised Domain Adaptation via Structural Constraints

    Authors: Rajshekhar Das, Jonathan Francis, Sanket Vaibhav Mehta, Jean Oh, Emma Strubell, Jose Moura

    Abstract: Self-training based on pseudo-labels has emerged as a dominant approach for addressing conditional distribution shifts in unsupervised domain adaptation (UDA) for semantic segmentation problems. A notable drawback, however, is that this family of approaches is susceptible to erroneous pseudo labels that arise from confirmation biases in the source domain and that manifest as nuisance factors in th… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  47. arXiv:2304.14916  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    "Can't Take the Pressure?": Examining the Challenges of Blood Pressure Estimation via Pulse Wave Analysis

    Authors: Suril Mehta, Nipun Kwatra, Mohit Jain, Daniel McDuff

    Abstract: The use of observed wearable sensor data (e.g., photoplethysmograms [PPG]) to infer health measures (e.g., glucose level or blood pressure) is a very active area of research. Such technology can have a significant impact on health screening, chronic disease management and remote monitoring. A common approach is to collect sensor data and corresponding labels from a clinical grade device (e.g., blo… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

  48. arXiv:2304.12404  [pdf, other

    cs.CL

    Semantic Tokenizer for Enhanced Natural Language Processing

    Authors: Sandeep Mehta, Darpan Shah, Ravindra Kulkarni, Cornelia Caragea

    Abstract: Traditionally, NLP performance improvement has been focused on improving models and increasing the number of model parameters. NLP vocabulary construction has remained focused on maximizing the number of words represented through subword regularization. We present a novel tokenizer that uses semantics to drive vocabulary construction. The tokenizer includes a trainer that uses stemming to enhance… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  49. arXiv:2303.10854  [pdf, ps, other

    cs.CY cs.AI

    Dynamic Documentation for AI Systems

    Authors: Soham Mehta, Anderson Rogers, Thomas Krendl Gilbert

    Abstract: AI documentation is a rapidly-growing channel for coordinating the design of AI technologies with policies for transparency and accessibility. Calls to standardize and enact documentation of algorithmic harms and impacts are now commonplace. However, documentation standards for AI remain inchoate, and fail to match the capabilities and social effects of increasingly impactful architectures such as… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  50. arXiv:2303.08983  [pdf, other

    cs.CV cs.AI cs.LG

    Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

    Authors: Fartash Faghri, Hadi Pouransari, Sachin Mehta, Mehrdad Farajtabar, Ali Farhadi, Mohammad Rastegari, Oncel Tuzel

    Abstract: We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-base… ▽ More

    Submitted 22 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted at International Conference on Computer Vision (ICCV) 2023. v2: Camera-ready version with new Tables 9 and 10. v3: Correction to Table 7-Avg. column