Zum Hauptinhalt springen

Showing 1–50 of 754 results for author: Mohan

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17431  [pdf, other

    eess.AS cs.AI

    Advancing Multi-talker ASR Performance with Large Language Models

    Authors: Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong Yu

    Abstract: Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR). Serialized output training (SOT) is a classic method to address multi-talker ASR, with the idea of concatenating transcriptions from multiple speakers according to the emission times of their speech for training. However, SOT-style transcr… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 8 pages, accepted by IEEE SLT 2024

  2. arXiv:2408.16885  [pdf

    cs.CR cs.ET cs.IR

    A Prototype Model of Zero-Trust Architecture Blockchain with EigenTrust-Based Practical Byzantine Fault Tolerance Protocol to Manage Decentralized Clinical Trials

    Authors: Ashok Kumar Peepliwall, Hari Mohan Pandey, Surya Prakash, Anand A Mahajan, Sudhinder Singh Chowhan, Vinesh Kumar, Rahul Sharma

    Abstract: The COVID-19 pandemic necessitated the emergence of decentralized Clinical Trials (DCTs) due to patient retention, accelerate trials, improve data accessibility, enable virtual care, and facilitate seamless communication through integrated systems. However, integrating systems in DCTs exposes clinical data to potential security threats, making them susceptible to theft at any stage, a high risk of… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: NA

  3. arXiv:2408.16423  [pdf, other

    eess.AS cs.SD

    WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding

    Authors: Mohan Li, Cong-Thanh Do, Simon Keizer, Youmna Farag, Svetlana Stoyanchev, Rama Doddipatla

    Abstract: Speech large language models (speech-LLMs) integrate speech and text-based foundation models to provide a unified framework for handling a wide range of downstream tasks. In this paper, we introduce WHISMA, a speech-LLM tailored for spoken language understanding (SLU) that demonstrates robust performance in various zero-shot settings. WHISMA combines the speech encoder from Whisper with the Llama-… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: accepted to SLT 2024

  4. arXiv:2408.15637  [pdf, other

    cs.CV

    Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection

    Authors: Sondos Mohamed, Walter Zimmer, Ross Greer, Ahmed Alaaeldin Ghita, Modesto Castrillón-Santana, Mohan Trivedi, Alois Knoll, Salvatore Mario Carta, Mirko Marras

    Abstract: Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 18 pages. Accepted for ECVA European Conference on Computer Vision 2024 (ECCV'24)

  5. arXiv:2408.11706  [pdf, other

    cs.CV

    FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting

    Authors: Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohan Sai Singamsetti, Fengyu Sun, Wei Lu, Di Niu

    Abstract: Text-to-image (T2I) diffusion models have demonstrated impressive capabilities in generating high-quality images given a text prompt. However, ensuring the prompt-image alignment remains a considerable challenge, i.e., generating images that faithfully align with the prompt's semantics. Recent works attempt to improve the faithfulness by optimizing the latent code, which potentially could cause th… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  6. arXiv:2408.08002  [pdf, other

    cs.CR

    Practical Privacy-Preserving Identity Verification using Third-Party Cloud Services and FHE (Role of Data Encoding in Circuit Depth Management)

    Authors: Deep Inder Mohan, Srinivas Vivek

    Abstract: National digital identity verification systems have played a critical role in the effective distribution of goods and services, particularly, in developing countries. Due to the cost involved in deploying and maintaining such systems, combined with a lack of in-house technical expertise, governments seek to outsource this service to third-party cloud service providers to the extent possible. This… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: This work was presented (without proceedings) at the Turing Trustworthy Digital Identity International Conference 2022 at The Alan Turing Institute, London, UK, on Sep. 16, 2022

  7. arXiv:2408.07942  [pdf, other

    cs.RO cs.MA

    Time-Ordered Ad-hoc Resource Sharing for Independent Robotic Agents

    Authors: Arjo Chakravarty, Michael X. Grey, M. A. Viraj J. Muthugala, Mohan Rajesh Elara

    Abstract: Resource sharing is a crucial part of a multi-robot system. We propose a Boolean satisfiability based approach to resource sharing. Our key contributions are an algorithm for converting any constrained assignment to a weighted-SAT based optimization. We propose a theorem that allows optimal resource assignment problems to be solved via repeated application of a SAT solver. Additionally we show a w… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: IROS 2024

  8. arXiv:2408.02964  [pdf, other

    cs.CL

    Accuracy and Consistency of LLMs in the Registered Dietitian Exam: The Impact of Prompt Engineering and Knowledge Retrieval

    Authors: Iman Azimi, Mohan Qi, Li Wang, Amir M. Rahmani, Youlin Li

    Abstract: Large language models (LLMs) are fundamentally transforming human-facing applications in the health and well-being domains: boosting patient engagement, accelerating clinical decision-making, and facilitating medical education. Although state-of-the-art LLMs have shown superior performance in several conversational applications, evaluations within nutrition and diet applications are still insuffic… ▽ More

    Submitted 7 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  9. arXiv:2408.00778  [pdf, other

    cs.HC cs.AI cs.LG

    Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions

    Authors: Qinshi Zhang, Latisha Besariani Hendra, Mohan Chi, Zijian Ding

    Abstract: The emergence of Generative AI is catalyzing a paradigm shift in user interfaces from command-based to intent-based outcome specification. In this paper, we explore abstract-to-detailed task transitions in the context of frontend code generation as a step towards intent-based user interfaces, aiming to bridge the gap between abstract user intentions and concrete implementations. We introduce Front… ▽ More

    Submitted 16 July, 2024; originally announced August 2024.

  10. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  11. arXiv:2407.21652  [pdf, other

    cs.CV cs.AI cs.LG

    Spatial Transformer Network YOLO Model for Agricultural Object Detection

    Authors: Yash Zambre, Ekdev Rajkitkul, Akshatha Mohan, Joshua Peeples

    Abstract: Object detection plays a crucial role in the field of computer vision by autonomously identifying and locating objects of interest. The You Only Look Once (YOLO) model is an effective single-shot detector. However, YOLO faces challenges in cluttered or partially occluded scenes and can struggle with small, low-contrast objects. We propose a new method that integrates spatial transformer networks (… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures, submitted for review

  12. arXiv:2407.20741  [pdf, other

    cs.LG math.DS math.NA

    Improving PINNs By Algebraic Inclusion of Boundary and Initial Conditions

    Authors: Mohan Ren, Zhihao Fang, Keren Li, Anirbit Mukherjee

    Abstract: "AI for Science" aims to solve fundamental scientific problems using AI techniques. As most physical phenomena can be described as Partial Differential Equations (PDEs) , approximating their solutions using neural networks has evolved as a central component of scientific-ML. Physics-Informed Neural Networks (PINNs) is the general method that has evolved for this task but its training is well-known… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 48 Pages, 25 Figures

  13. arXiv:2407.19852  [pdf

    quant-ph cs.LG q-bio.BM

    Quantum Long Short-Term Memory for Drug Discovery

    Authors: Liang Zhang, Yin Xu, Mohan Wu, Liang Wang, Hua Xu

    Abstract: Quantum computing combined with machine learning (ML) is an extremely promising research area, with numerous studies demonstrating that quantum machine learning (QML) is expected to solve scientific problems more effectively than classical ML. In this work, we successfully apply QML to drug discovery, showing that QML can significantly improve model performance and achieve faster convergence compa… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  14. arXiv:2407.18911  [pdf, other

    cs.RO cs.CV

    HRP: Human Affordances for Robotic Pre-Training

    Authors: Mohan Kumar Srirama, Sudeep Dasari, Shikhar Bahl, Abhinav Gupta

    Abstract: In order to *generalize* to various tasks in the wild, robotic agents will need a suitable representation (i.e., vision network) that enables the robot to predict optimal actions given high dimensional vision inputs. However, learning such a representation requires an extreme amount of diverse training data, which is prohibitively expensive to collect on a real robot. How can we overcome this prob… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted to Robotics Science and Systems 2024

  15. arXiv:2407.16384  [pdf, other

    cs.CV

    A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset

    Authors: Koushikey Chhapariya, Alexandre Benoit, Krishna Mohan Buddhiraju, Anil Kumar

    Abstract: Multitask learning is a widely recognized technique in the field of computer vision and deep learning domain. However, it is still a research question in remote sensing, particularly for hyperspectral imaging. Moreover, most of the research in the remote sensing domain focuses on small and single-task-based annotated datasets, which limits the generalizability and scalability of the developed mode… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  16. arXiv:2407.15003  [pdf, other

    cs.CR

    Requiem for a drone: a machine-learning based framework for stealthy attacks against unmanned autonomous vehicles

    Authors: Kyo Hyun Kim, Denizhan Kara, Vineetha Paruchuri, Sibin Mohan, Greg Kimberly, Jae Kim, Josh Eckhardt

    Abstract: There is a space of uncertainty in the modeling of vehicular dynamics of autonomous systems due to noise in sensor readings, environmental factors or modeling errors. We present Requiem, a software-only, blackbox approach that exploits this space in a stealthy manner causing target systems, e.g., unmanned aerial vehicles (UAVs), to significantly deviate from their mission parameters. Our system ac… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  17. arXiv:2407.14757  [pdf, other

    cs.CV

    Enhancing Skin Disease Classification Leveraging Transformer-based Deep Learning Architectures and Explainable AI

    Authors: Jayanth Mohan, Arrun Sivasubramanian, V Sowmya, Ravi Vinayakumar

    Abstract: Skin diseases affect over a third of the global population, yet their impact is often underestimated. Automating skin disease classification to assist doctors with their prognosis might be difficult. Nevertheless, due to efficient feature extraction pipelines, deep learning techniques have shown much promise for various tasks, including dermatological disease identification. This study uses a skin… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: Submitted to Computers in Biology and Medicine

  18. arXiv:2407.14103  [pdf, other

    cs.CV

    Zero-Shot Underwater Gesture Recognition

    Authors: Sandipan Sarma, Gundameedi Sai Ram Mohan, Hariansh Sehgal, Arijit Sur

    Abstract: Hand gesture recognition allows humans to interact with machines non-verbally, which has a huge application in underwater exploration using autonomous underwater vehicles. Recently, a new gesture-based language called CADDIAN has been devised for divers, and supervised learning methods have been applied to recognize the gestures with high accuracy. However, such methods fail when they encounter un… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted to ICPR 2024. 15 pages, 6 figures. Project page: https://github.com/sandipan211/ZSUGR

  19. arXiv:2407.13513  [pdf, other

    cs.LG

    Instance Selection for Dynamic Algorithm Configuration with Reinforcement Learning: Improving Generalization

    Authors: Carolin Benjamins, Gjorgjina Cenikj, Ana Nikolikj, Aditya Mohan, Tome Eftimov, Marius Lindauer

    Abstract: Dynamic Algorithm Configuration (DAC) addresses the challenge of dynamically setting hyperparameters of an algorithm for a diverse set of instances rather than focusing solely on individual tasks. Agents trained with Deep Reinforcement Learning (RL) offer a pathway to solve such settings. However, the limited generalization performance of these agents has significantly hindered the application in… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Journal ref: GECCO 2024

  20. arXiv:2407.12699  [pdf, other

    cs.GT

    Mechanism Design via the Interim Relaxation

    Authors: Kshipra Bhawalkar, Marios Mertzanidis, Divyarthi Mohan, Alexandros Psomas

    Abstract: We study revenue maximization for agents with additive preferences, subject to downward-closed constraints on the set of feasible allocations. In seminal work, Alaei~\cite{alaei2014bayesian} introduced a powerful multi-to-single agent reduction based on an ex-ante relaxation of the multi-agent problem. This reduction employs a rounding procedure which is an online contention resolution scheme (OCR… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  21. arXiv:2407.09556  [pdf

    cs.CV cs.AI

    Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention

    Authors: Rishi Kesav Mohan, Sanjay Sureshkumar, Vignesh Sivasubramaniam

    Abstract: Image captioning is a technology that produces text-based descriptions for an image. Deep learning-based solutions built on top of feature recognition may very well serve the purpose. But as with any other machine learning solution, the user understanding in the process of caption generation is poor and the model does not provide any explanation for its predictions and hence the conventional metho… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 23 pages,9 figures

    MSC Class: 68T50 ACM Class: I.2.7

  22. arXiv:2407.08280  [pdf, other

    cs.CV cs.GR cs.RO

    WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving

    Authors: Jannik Zürn, Paul Gladkov, Sofía Dudas, Fergal Cotter, Sofi Toteva, Jamie Shotton, Vasiliki Simaiaki, Nikhil Mohan

    Abstract: We present WayveScenes101, a dataset designed to help the community advance the state of the art in novel view synthesis that focuses on challenging driving scenes containing many dynamic and deformable elements with changing geometry and texture. The dataset comprises 101 driving scenes across a wide range of environmental conditions and driving scenarios. The dataset is designed for benchmarking… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 7 pages

  23. arXiv:2407.07000  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems

    Authors: Amey Agrawal, Anmol Agarwal, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov

    Abstract: Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems are evaluated against conventional latency and throughput metrics (eg. TTFT, TBT, Normalised Latency and TPOT). However, these metrics fail to fully capture the nuances of LLM inference, leading to an incomplete assessment of use… ▽ More

    Submitted 29 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  24. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  25. arXiv:2407.04589  [pdf, other

    cs.LG

    Remembering Everything Makes You Vulnerable: A Limelight on Machine Unlearning for Personalized Healthcare Sector

    Authors: Ahan Chatterjee, Sai Anirudh Aryasomayajula, Rajat Chaudhari, Subhajit Paul, Vishwa Mohan Singh

    Abstract: As the prevalence of data-driven technologies in healthcare continues to rise, concerns regarding data privacy and security become increasingly paramount. This thesis aims to address the vulnerability of personalized healthcare models, particularly in the context of ECG monitoring, to adversarial attacks that compromise patient privacy. We propose an approach termed "Machine Unlearning" to mitigat… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 15 Pages, Exploring unlearning techniques on ECG Classifier

  26. Treatment of near-incompressibility and volumetric locking in higher order material point methods

    Authors: Ram Mohan Telikicherla, Georgios Moutsanidis

    Abstract: We propose a novel projection method to treat near-incompressibility and volumetric locking in small- and large-deformation elasticity and plasticity within the context of higher order material point methods. The material point method is well known to exhibit volumetric locking due to the presence of large numbers of material points per element that are used to decrease the quadrature error. Altho… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Journal ref: Computer Methods in Applied Mechanics and Engineering 395 (2022) 114985

  27. arXiv:2407.01374  [pdf, other

    cs.CL

    Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

    Authors: Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

    Abstract: Malaysian English is a low resource creole language, where it carries the elements of Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity Recognition (NER) models underperform when capturing entities from Malaysian English text due to its distinctive morphosyntactic adaptations, semantic features and code-switching (mixing English and Malay). Considering these gaps,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted in 9th Workshop on Representation Learning for NLP (Rep4NLP) at ACL 2024

  28. arXiv:2406.14908  [pdf, other

    cs.HC

    Can we say a cat is a cat? Understanding the challenges in annotating physiological signal-based emotion data

    Authors: Pragya Singh, Mohan Kumar, Pushpendra Singh

    Abstract: Artificial Intelligence (AI) algorithms, trained on emotion data extracted from physiological signals, provide a promising approach to monitoring emotions, affect, and mental well-being. However, the field encounters challenges because there is a lack of effective methods for collecting high-quality data in everyday settings that genuinely reflect changes in emotion or affect. This paper presents… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 7 pages, To be published at PhysioCHI: Towards Best Practices for Integrating Physiological Signals in HCI, May 11, 2024, Honolulu, HI, USA

  29. arXiv:2406.12053  [pdf, other

    cs.CL

    InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States

    Authors: Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang

    Abstract: Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention st… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  30. arXiv:2406.11877  [pdf

    physics.ao-ph cs.LG

    Solar Power Prediction Using Satellite Data in Different Parts of Nepal

    Authors: Raj Krishna Nepal, Bibek Khanal, Vibek Ghimire, Kismat Neupane, Atul Pokharel, Kshitij Niraula, Baburam Tiwari, Nawaraj Bhattarai, Khem N. Poudyal, Nawaraj Karki, Mohan B Dangi, John Biden

    Abstract: Due to the unavailability of solar irradiance data for many potential sites of Nepal, the paper proposes predicting solar irradiance based on alternative meteorological parameters. The study focuses on five distinct regions in Nepal and utilizes a dataset spanning almost ten years, obtained from CERES SYN1deg and MERRA-2. Machine learning models such as Random Forest, XGBoost, K-Nearest Neighbors,… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 20 pages, 12 figures, 5 tables

  31. arXiv:2406.10797  [pdf, other

    cs.CV

    STAR: Scale-wise Text-to-image generation via Auto-Regressive representations

    Authors: Xiaoxiao Ma, Mohan Zhou, Tao Liang, Yalong Bai, Tiejun Zhao, Huaian Chen, Yi Jin

    Abstract: We present STAR, a text-to-image model that employs scale-wise auto-regressive paradigm. Unlike VAR, which is limited to class-conditioned synthesis within a fixed set of predetermined categories, our STAR enables text-driven open-set generation through three key designs: To boost diversity and generalizability with unseen combinations of objects and concepts, we introduce a pre-trained text encod… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures

  32. arXiv:2406.09961  [pdf, other

    cs.SE cs.CL cs.CV

    ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

    Authors: Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang

    Abstract: We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which repres… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Data and code are available at https://github.com/ChartMimic/ChartMimic

  33. arXiv:2406.07332  [pdf, other

    cs.CV

    Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach

    Authors: Challapalli Phanindra Revanth, Sumohana S. Channappayya, C Krishna Mohan

    Abstract: Computing the loss gradient via backpropagation consumes considerable energy during deep learning (DL) model training. In this paper, we propose a novel approach to efficiently compute DL models' gradients to mitigate the substantial energy overhead associated with backpropagation. Exploiting the over-parameterized nature of DL models and the smoothness of their loss landscapes, we propose a metho… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  34. arXiv:2406.04629  [pdf, other

    cs.CV cs.GR cs.MM

    STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting

    Authors: Zenghao Chai, Chen Tang, Yongkang Wong, Mohan Kankanhalli

    Abstract: The creation of 4D avatars (i.e., animated 3D avatars) from text description typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in the canonical space and subsequently applies animation with target motions. However, such an optimization-by-animation paradigm has several drawbacks. (1) For pose-agnostic optimization, the rendered images in canonical pose for naive Score Di… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Tech report

  35. arXiv:2406.01609  [pdf, other

    cs.IR cs.CL cs.LG

    Judgement Citation Retrieval using Contextual Similarity

    Authors: Akshat Mohan Dasula, Hrushitha Tigulla, Preethika Bhukya

    Abstract: Traditionally in the domain of legal research, the retrieval of pertinent citations from intricate case descriptions has demanded manual effort and keyword-based search applications that mandate expertise in understanding legal jargon. Legal case descriptions hold pivotal information for legal professionals and researchers, necessitating more efficient and automated approaches. We propose a method… ▽ More

    Submitted 15 August, 2024; v1 submitted 28 May, 2024; originally announced June 2024.

    Comments: 14 pages, 16 images

  36. arXiv:2406.00071  [pdf

    astro-ph.IM astro-ph.SR cs.LG

    Optimizing Photometric Light Curve Analysis: Evaluating Scipy's Minimize Function for Eclipse Mapping of Cataclysmic Variables

    Authors: Anoop Kumar, Madan Mohan Tito Ayyalasomayajula, Dheerendra Panwar, Yeshwanth Vasa

    Abstract: With a particular focus on Scipy's minimize function the eclipse mapping method is thoroughly researched and implemented utilizing Python and essential libraries. Many optimization techniques are used, including Sequential Least Squares Programming (SLSQP), Nelder-Mead, and Conjugate Gradient (CG). However, for the purpose of examining photometric light curves these methods seek to solve the maxim… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

  37. arXiv:2405.18836  [pdf, other

    stat.ME cs.LG

    Do Finetti: On Causal Effects for Exchangeable Data

    Authors: Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf

    Abstract: We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  38. arXiv:2405.18351  [pdf, other

    cs.LG astro-ph.IM

    Evaluating Bayesian deep learning for radio galaxy classification

    Authors: Devina Mohan, Anna M. M. Scaife

    Abstract: The radio astronomy community is rapidly adopting deep learning techniques to deal with the huge data volumes expected from the next generation of radio observatories. Bayesian neural networks (BNNs) provide a principled way to model uncertainty in the predictions made by such deep learning models and will play an important role in extracting well-calibrated uncertainty estimates on their outputs.… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)

  39. arXiv:2405.17731  [pdf, other

    cs.DB

    Evaluating NoSQL Databases for OLAP Workloads: A Benchmarking Study of MongoDB, Redis, Kudu and ArangoDB

    Authors: Rishi Kesav Mohan, Risheek Rakshit Sukumar Kanmani, Krishna Anandan Ganesan, Nisha Ramasubramanian

    Abstract: In the era of big data, conventional RDBMS models have become impractical for handling colossal workloads. Consequently, NoSQL databases have emerged as the preferred storage solutions for executing processing-intensive Online Analytical Processing (OLAP) tasks. Within the realm of NoSQL databases, various classifications exist based on their data storage mechanisms, making it challenging to selec… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  40. arXiv:2405.16934  [pdf, other

    cs.CV

    Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR

    Authors: Zhenyang Li, Yangyang Guo, Kejie Wang, Xiaolin Chen, Liqiang Nie, Mohan Kankanhalli

    Abstract: Visual Commonsense Reasoning (VCR) calls for explanatory reasoning behind question answering over visual scenes. To achieve this goal, a model is required to provide an acceptable rationale as the reason for the predicted answers. Progress on the benchmark dataset stems largely from the recent advancement of Vision-Language Transformers (VL Transformers). These models are first pre-trained on some… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  41. arXiv:2405.15328  [pdf, other

    cs.LG cs.IR

    Multi-Modal Recommendation Unlearning

    Authors: Yash Sinha, Murari Mandal, Mohan Kankanhalli

    Abstract: Unlearning methods for recommender systems (RS) have emerged to address privacy issues and concerns about legal compliance. However, evolving user preferences and content licensing issues still remain unaddressed. This is particularly true in case of multi-modal recommender systems (MMRS), which aim to accommodate the growing influence of multi-modal information on user preferences. Previous unlea… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  42. arXiv:2405.13911  [pdf, other

    cs.CV cs.AI cs.CL

    TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment

    Authors: Wei Li, Hehe Fan, Yongkang Wong, Mohan Kankanhalli, Yi Yang

    Abstract: Recent advancements in image understanding have benefited from the extensive use of web image-text pairs. However, video understanding remains a challenge despite the availability of substantial web video-text data. This difficulty primarily arises from the inherent complexity of videos and the inefficient language supervision in recent web-collected video-text datasets. In this paper, we introduc… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 32 pages, 12 figures, 11 tables

  43. arXiv:2405.12538  [pdf, other

    cs.CV

    Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

    Authors: Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun, Joo Hwee Lim, Mohan Kankanhalli

    Abstract: For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leadi… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  44. arXiv:2405.11180  [pdf, other

    cs.CV cs.HC

    GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition

    Authors: Mallika Garg, Debashis Ghosh, Pyari Mohan Pradhan

    Abstract: Transformer model have achieved state-of-the-art results in many applications like NLP, classification, etc. But their exploration in gesture recognition task is still limited. So, we propose a novel GestFormer architecture for dynamic hand gesture recognition. The motivation behind this design is to propose a resource efficient transformer model, since transformers are computationally expensive a… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  45. arXiv:2405.09049  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous Driving

    Authors: Ross Greer, Mohan Trivedi

    Abstract: This study investigates the use of trajectory and dynamic state information for efficient data curation in autonomous driving machine learning tasks. We propose methods for clustering trajectory-states and sampling strategies in an active learning framework, aiming to reduce annotation and data costs while maintaining model performance. Our approach leverages trajectory information to guide data s… ▽ More

    Submitted 20 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  46. InsightNet: Structured Insight Mining from Customer Feedback

    Authors: Sandeep Sricharan Mukku, Manan Soni, Jitenkumar Rana, Chetan Aggarwal, Promod Yenigalla, Rashmi Patange, Shyam Mohan

    Abstract: We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level t… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: EMNLP 2023

  47. arXiv:2405.05465  [pdf, other

    cs.LG cs.AI cs.CL

    Vidur: A Large-Scale Simulation Framework For LLM Inference

    Authors: Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, Alexey Tumanov

    Abstract: Optimizing the deployment of Large language models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address this challenge, we present Vidur - a large-scale, high-fidelity, easil… ▽ More

    Submitted 21 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  48. arXiv:2405.04437  [pdf, other

    cs.LG cs.OS

    vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

    Authors: Ramya Prabhu, Ajay Nayak, Jayashree Mohan, Ramachandran Ramjee, Ashish Panwar

    Abstract: Efficient management of GPU memory is essential for high throughput LLM inference. Prior systems used to reserve KV-cache memory ahead-of-time that resulted in wasted capacity due to internal fragmentation. Inspired by demand paging, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation and improves serving throughout. However, to be… ▽ More

    Submitted 12 July, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 14 pages, 13 figures, 10 tables

  49. arXiv:2405.03411  [pdf, other

    cs.RO

    Greedy Heuristics for Sampling-based Motion Planning in High-Dimensional State Spaces

    Authors: Phone Thiha Kyaw, Anh Vu Le, Lim Yi, Prabakaran Veerajagadheswar, Mohan Rajesh Elara, Dinh Tung Vo, Minh Bui Vu

    Abstract: Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approxima… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: To be published at the International Journal of Robotics Research (IJRR)

  50. arXiv:2404.19075  [pdf, other

    eess.IV cs.AI cs.CV cs.LG math.NA

    Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

    Authors: K. Aditya Mohan, Massimiliano Ferrucci, Chuck Divin, Garrett A. Stevenson, Hyojin Kim

    Abstract: 4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: submitted to Nature Machine Intelligence