Skip to main content

Showing 1–50 of 1,059 results for author: Kumar, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10317  [pdf, other

    cs.AR

    Towards Efficient Design Verification -- Constrained Random Verification using PyUVM

    Authors: Deepak Narayan Gadde, Suruchi Kumari, Aman Kumar

    Abstract: Python, as a multi-paradigm language known for its ease of integration with other languages, has gained significant attention among verification engineers recently. A Python-based verification environment capitalizes on open-source frameworks such as PyUVM providing Python-based UVM 1.2 implementation and PyVSC facilitating constrained randomization and functional coverage. These libraries play a… ▽ More

    Submitted 6 May, 2024; originally announced July 2024.

    Comments: Published in DVCon U.S. 2024

  2. arXiv:2407.10312  [pdf, other

    cs.AR cs.AI

    Effective Design Verification -- Constrained Random with Python and Cocotb

    Authors: Deepak Narayan Gadde, Suruchi Kumari, Aman Kumar

    Abstract: Being the most widely used language across the world due to its simplicity and with 35 keywords (v3.7), Python attracts both hardware and software engineers. Python-based verification environment leverages open-source libraries such as cocotb and cocotb-coverage that enables interfacing the tesbenches with any available simulator and facilitating constrained randomization, coverage respectively. T… ▽ More

    Submitted 6 May, 2024; originally announced July 2024.

    Comments: Published in DVCon Europe 2023

  3. arXiv:2407.07612  [pdf, other

    cs.LG cs.AI cs.CL

    Teaching Transformers Causal Reasoning through Axiomatic Training

    Authors: Aniket Vashishtha, Abhinav Kumar, Abbavaram Gowtham Reddy, Vineeth N Balasubramanian, Amit Sharma

    Abstract: For text-based AI systems to interact in the real world, causal reasoning is an essential skill. Since interventional data is costly to generate, we study to what extent an agent can learn causal reasoning from passive data. Specifically, we consider an axiomatic training setup where an agent learns from multiple demonstrations of a causal axiom (or rule), rather than incorporating the axiom as an… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  4. arXiv:2407.06893  [pdf

    cs.CL cs.CE

    Measuring Sustainability Intention of ESG Fund Disclosure using Few-Shot Learning

    Authors: Mayank Singh, Nazia Nafis, Abhijeet Kumar, Mridul Mishra

    Abstract: Global sustainable fund universe encompasses open-end funds and exchange-traded funds (ETF) that, by prospectus or other regulatory filings, claim to focus on Environment, Social and Governance (ESG). Challengingly, the claims can only be confirmed by examining the textual disclosures to check if there is presence of intentionality and ESG focus on its investment strategy. Currently, there is no r… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: This paper was presented at 'AI applications in ESG Conference' at IIM Bangalore, India (Nov, 2023)

  5. arXiv:2407.06868  [pdf, other

    cs.IT cs.LG eess.SP

    Energy Efficient Fair STAR-RIS for Mobile Users

    Authors: Ashok S. Kumar, Nancy Nayak, Sheetal Kalyani, Himal A. Suraweera

    Abstract: In this work, we propose a method to improve the energy efficiency and fairness of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) for mobile users, ensuring reduced power consumption while maintaining reliable communication. To achieve this, we introduce a new parameter known as the subsurface assignment variable, which determines the number of STAR-RIS e… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  6. arXiv:2407.06110  [pdf, other

    cs.CV

    FGA: Fourier-Guided Attention Network for Crowd Count Estimation

    Authors: Yashwardhan Chaudhuri, Ankit Kumar, Arun Balaji Buduru, Adel Alshamrani

    Abstract: Crowd counting is gaining societal relevance, particularly in domains of Urban Planning, Crowd Management, and Public Safety. This paper introduces Fourier-guided attention (FGA), a novel attention mechanism for crowd count estimation designed to address the inefficient full-scale global pattern capture in existing works on convolution-based attention networks. FGA efficiently captures multi-scale… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to IJCNN'24

  7. arXiv:2407.06093  [pdf, other

    cs.AI

    Artificial Intuition: Efficient Classification of Scientific Abstracts

    Authors: Harsh Sakhrani, Naseela Pervez, Anirudh Ravi Kumar, Fred Morstatter, Alexandra Graddy Reed, Andrea Belz

    Abstract: It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we h… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  8. arXiv:2407.04268  [pdf, other

    cs.LG cs.AI cs.SE

    NeuFair: Neural Network Fairness Repair with Dropout

    Authors: Vishnu Asutosh Dasu, Ashish Kumar, Saeid Tizpaz-Niari, Gang Tan

    Abstract: This paper investigates neuron dropout as a post-processing bias mitigation for deep neural networks (DNNs). Neural-driven software solutions are increasingly applied in socially critical domains with significant fairness implications. While neural networks are exceptionally good at finding statistical patterns from data, they may encode and amplify existing biases from the historical data. Existi… ▽ More

    Submitted 12 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Paper accepted at ACM ISSTA 2024

  9. arXiv:2407.03941  [pdf, other

    cs.SE cs.AI cs.CL

    Narrow Transformer: Starcoder-Based Java-LM For Desktop

    Authors: Kamalkumar Rathinasamy, Balaji A J, Ankush Kumar, Gagan Gayari, Harshini K, Rajab Ali Mondal, Sreenivasa Raghavan K S, Swayam Singh

    Abstract: This paper presents NT-Java-1.1B, an open-source specialized code language model built on StarCoderBase-1.1B, designed for coding tasks in Java programming. NT-Java-1.1B achieves state-of-the-art performance, surpassing its base model and majority of other models of similar size on MultiPL-E Java code benchmark. While there have been studies on extending large, generic pre-trained models to improv… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    ACM Class: I.2.7

  10. arXiv:2407.03648  [pdf, other

    eess.AS cs.SD

    High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

    Authors: Gael Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra

    Abstract: We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  11. arXiv:2407.01306  [pdf, other

    cs.LG cs.CR

    Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability

    Authors: Chenxi Li, Abhinav Kumar, Zhen Guo, Jie Hou, Reza Tourani

    Abstract: The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 20 pages, 10 figures, 4 tables

  12. arXiv:2407.00866  [pdf, other

    cs.LG

    Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning

    Authors: Nexhi Sula, Abhinav Kumar, Jie Hou, Han Wang, Reza Tourani

    Abstract: With the continued advancement and widespread adoption of machine learning (ML) models across various domains, ensuring user privacy and data security has become a paramount concern. In compliance with data privacy regulations, such as GDPR, a secure machine learning framework should not only grant users the right to request the removal of their contributed data used for model training but also fa… ▽ More

    Submitted 5 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: 17 pages, 14 figures, 6 tables

  13. arXiv:2407.00774  [pdf, other

    quant-ph cs.LG

    Advantages of quantum support vector machine in cross-domain classification of quantum states

    Authors: Diksha Sharma, Vivek Balasaheb Sabale, Parvinder Singh, Atul Kumar

    Abstract: In this study, we use cross-domain classification using quantum machine learning for quantum advantages to address the entanglement versus separability paradigm. We further demonstrate the efficient classification of Bell diagonal states into zero and non-zero discord classes. The inherited structure of quantum states and its relation with a particular class of quantum states are exploited to intu… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  14. arXiv:2407.00537  [pdf, other

    eess.IV cs.CV cs.LG

    Accelerating Longitudinal MRI using Prior Informed Latent Diffusion

    Authors: Yonatan Urman, Zachary Shah, Ashwin Kumar, Bruno P. Soares, Kawin Setsompop

    Abstract: MRI is a widely used ionization-free soft-tissue imaging modality, often employed repeatedly over a patient's lifetime. However, prolonged scanning durations, among other issues, can limit availability and accessibility. In this work, we aim to substantially reduce scan times by leveraging prior scans of the same patient. These prior scans typically contain considerable shared information with the… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  15. arXiv:2407.00071  [pdf, other

    cs.AI cs.CL cs.ET cs.LG

    Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization

    Authors: Mert Esencan, Tarun Advaith Kumar, Ata Akbari Asanjan, P. Aaron Lott, Masoud Mohseni, Can Unlu, Davide Venturelli, Alan Ho

    Abstract: Recent Large Language Models (LLMs) have demonstrated impressive capabilities at tasks that require human intelligence and are a significant step towards human-like artificial intelligence (AI). Yet the performance of LLMs at reasoning tasks have been subpar and the reasoning capability of LLMs is a matter of significant debate. While it has been shown that the choice of the prompting technique to… ▽ More

    Submitted 19 June, 2024; originally announced July 2024.

    Comments: 13 pages, 3 figures

  16. arXiv:2406.17304  [pdf, other

    cs.CL

    Leveraging LLMs for Dialogue Quality Measurement

    Authors: Jinghan Jia, Abi Komma, Timothy Leffel, Xujun Peng, Ajay Nagesh, Tamer Soliman, Aram Galstyan, Anoop Kumar

    Abstract: In task-oriented conversational AI evaluation, unsupervised methods poorly correlate with human judgments, and supervised approaches lack generalization. Recent advances in large language models (LLMs) show robust zeroshot and few-shot capabilities across NLP tasks. This paper explores using LLMs for automated dialogue quality evaluation, experimenting with various configurations on public and pro… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  17. arXiv:2406.16008  [pdf, other

    cs.CL cs.AI cs.LG

    Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

    Authors: Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

    Abstract: Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between… ▽ More

    Submitted 3 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: ACL Findings 2024

  18. arXiv:2406.15649   

    cs.CV

    Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe

    Authors: Sandeep Singh Sengar, Abhishek Kumar, Owen Singh

    Abstract: This study presents significant enhancements in human pose estimation using the MediaPipe framework. The research focuses on improving accuracy, computational efficiency, and real-time processing capabilities by comprehensively optimising the underlying algorithms. Novel modifications are introduced that substantially enhance pose estimation accuracy across challenging scenarios, such as dynamic m… ▽ More

    Submitted 13 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: There is an error in this work. BY mistake in Section 3.3, the angle is calculated wrongly

  19. arXiv:2406.15646  [pdf, other

    cs.CV

    VigilEye -- Artificial Intelligence-based Real-time Driver Drowsiness Detection

    Authors: Sandeep Singh Sengar, Aswin Kumar, Owen Singh

    Abstract: This study presents a novel driver drowsiness detection system that combines deep learning techniques with the OpenCV framework. The system utilises facial landmarks extracted from the driver's face as input to Convolutional Neural Networks trained to recognise drowsiness patterns. The integration of OpenCV enables real-time video processing, making the system suitable for practical implementation… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  20. arXiv:2406.15565  [pdf, other

    cs.CV cs.LG

    Unseen Object Reasoning with Shared Appearance Cues

    Authors: Paridhi Singh, Arun Kumar

    Abstract: This paper introduces an innovative approach to open world recognition (OWR), where we leverage knowledge acquired from known objects to address the recognition of previously unseen objects. The traditional method of object modeling relies on supervised learning with strict closed-set assumptions, presupposing that objects encountered during inference are already known at the training phase. Howev… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  21. arXiv:2406.14532  [pdf, other

    cs.LG cs.CL

    RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

    Authors: Amrith Setlur, Saurabh Garg, Xinyang Geng, Naman Garg, Virginia Smith, Aviral Kumar

    Abstract: Training on model-generated synthetic data is a promising approach for finetuning LLMs, but it remains unclear when it helps or hurts. In this paper, we investigate this question for math reasoning via an empirical study, followed by building a conceptual understanding of our observations. First, we find that while the typical approach of finetuning a model on synthetic correct or positive problem… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  22. arXiv:2406.13236  [pdf, other

    cs.CL cs.AI

    Data Contamination Can Cross Language Barriers

    Authors: Feng Yao, Yufan Zhuang, Zihao Sun, Sunan Xu, Animesh Kumar, Jingbo Shang

    Abstract: The opacity in developing large language models (LLMs) is raising growing concerns about the potential contamination of public benchmarks in the pre-training data. Existing contamination detection methods are typically based on the text overlap between training and evaluation data, which can be too superficial to reflect deeper forms of contamination. In this paper, we first present a cross-lingua… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  23. arXiv:2406.12644  [pdf, other

    cs.CL cs.AI

    Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models

    Authors: Devichand Budagam, Sankalp KJ, Ashutosh Kumar, Vinija Jain, Aman Chadha

    Abstract: Assessing the effectiveness of large language models (LLMs) in addressing diverse tasks is essential for comprehending their strengths and weaknesses. Conventional evaluation techniques typically apply a single prompting strategy uniformly across datasets, not considering the varying degrees of task complexity. We introduce the Hierarchical Prompting Taxonomy (HPT), a taxonomy that employs a Hiera… ▽ More

    Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  24. arXiv:2406.11925  [pdf, other

    cs.SE cs.AI cs.CL

    DocCGen: Document-based Controlled Code Generation

    Authors: Sameer Pimparkhede, Mehant Kammakomati, Srikanth Tamilselvam, Prince Kumar, Ashok Pon Kumar, Pushpak Bhattacharyya

    Abstract: Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical usage for structured domain-specific languages (DSLs) such as YAML, JSON is limited due to domain-specific schema, grammar, and customizations generally unseen by… ▽ More

    Submitted 3 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.11896  [pdf, other

    cs.LG

    DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

    Authors: Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar

    Abstract: Training corpuses for vision language models (VLMs) typically lack sufficient amounts of decision-centric data. This renders off-the-shelf VLMs sub-optimal for decision-making tasks such as in-the-wild device control through graphical user interfaces (GUIs). While training with static demonstrations has shown some promise, we show that such methods fall short for controlling real GUIs due to their… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 11 pages of main text, 28 pages in total

  26. arXiv:2406.11619  [pdf, other

    eess.AS cs.LG

    AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

    Authors: Vahid Ahmadi Kalkhorani, Cheng Yu, Anurag Kumar, Ke Tan, Buye Xu, DeLiang Wang

    Abstract: Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the CrossNet architecture, which is a recently proposed network that performs complex spectral mapping for speech separation by lever… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 Figures, and 4 Tables

  27. arXiv:2406.10935  [pdf, other

    cs.CV

    Pick-or-Mix: Dynamic Channel Sampling for ConvNets

    Authors: Ashish Kumar, Daneul Kim, Jaesik Park, Laxmidhar Behera

    Abstract: Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for d… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Published in Computer Vision and Pattern Recognition (CVPR 2024)

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  28. arXiv:2406.10764  [pdf, other

    cs.CL

    GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges

    Authors: Darshan Deshpande, Shambhavi Sinha, Anirudh Ravi Kumar, Debaditya Pal, Jonathan May

    Abstract: Language Models have previously shown strong negotiation capabilities in closed domains where the negotiation strategy prediction scope is constrained to a specific setup. In this paper, we first show that these models are not generalizable beyond their original training domain despite their wide-scale pretraining. Following this, we propose an automated framework called GNOME, which processes exi… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  29. arXiv:2406.09329  [pdf, other

    cs.LG cs.AI

    Is Value Learning Really the Main Bottleneck in Offline RL?

    Authors: Seohong Park, Kevin Frans, Sergey Levine, Aviral Kumar

    Abstract: While imitation learning requires access to high-quality data, offline reinforcement learning (RL) should, in principle, perform similarly or better with substantially lower data quality by using a value function. However, current results indicate that offline RL often performs worse than imitation learning, and it is often unclear what holds back the performance of offline RL. Motivated by this o… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  30. arXiv:2406.06533  [pdf, other

    cs.AR cs.AI

    Pragmatic Formal Verification Methodology for Clock Domain Crossing (CDC)

    Authors: Aman Kumar, Muhammad Ul Haque Khan, Bijitendra Mittra

    Abstract: Modern System-on-Chip (SoC) designs are becoming more and more complex due to the technology upscaling. SoC designs often operate on multiple asynchronous clock domains, further adding to the complexity of the overall design. To make the devices power efficient, designers take a Globally-Asynchronous Locally-Synchronous (GALS) approach that creates multiple asynchronous domains. These Clock Domain… ▽ More

    Submitted 20 April, 2024; originally announced June 2024.

    Comments: Published in DVCon Europe 2023

  31. arXiv:2406.06512  [pdf, other

    cs.CV cs.AI

    Merlin: A Vision Language Foundation Model for 3D Computed Tomography

    Authors: Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Cesar Truyts, Christian Bluethgen, Malte Engmann Kjeldskov Jensen, Sophie Ostmeier, Maya Varma, Jeya Maria Jose Valanarasu, Zhongnan Fang, Zepeng Huo, Zaid Nabulsi, Diego Ardila, Wei-Hung Weng, Edson Amaro Junior, Neera Ahuja, Jason Fries, Nigam H. Shah, Andrew Johnston , et al. (6 additional authors not shown)

    Abstract: Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures

  32. arXiv:2406.04744  [pdf, other

    cs.CL

    CRAG -- Comprehensive RAG Benchmark

    Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  33. arXiv:2406.04660  [pdf, other

    eess.AS cs.SD

    URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

    Authors: Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

    Abstract: The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this gap and promote research toward universal SE, we establish a new SE challenge, named URGENT, to focus on the universality, robustness, and generaliza… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures, 3 tables. Accepted by Interspeech 2024. An extended version of the accepted manuscript with appendix

  34. arXiv:2406.04413  [pdf, other

    cs.CV cs.AI

    Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning

    Authors: Amandeep Kumar, Muhammad Awais, Sanath Narayan, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer

    Abstract: Drawing upon StyleGAN's expressivity and disentangled latent space, existing 2D approaches employ textual prompting to edit facial images with different attributes. In contrast, 3D-aware approaches that generate faces at different target poses require attribute-specific classifiers, learning separate model weights for each attribute, and are not scalable for novel attributes. In this work, we prop… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  35. arXiv:2406.03747  [pdf, other

    cs.CV cs.AI cs.LG

    Instance Segmentation and Teeth Classification in Panoramic X-rays

    Authors: Devichand Budagam, Ayush Kumar, Sayan Ghosh, Anuj Shrivastav, Azamat Zhanatuly Imanbayev, Iskander Rafailovich Akhmetov, Dmitrii Kaplun, Sergey Antonov, Artem Rychenkov, Gleb Cyganov, Aleksandr Sinitca

    Abstract: Teeth segmentation and recognition are critical in various dental applications and dental diagnosis. Automatic and accurate segmentation approaches have been made possible by integrating deep learning models. Although teeth segmentation has been studied in the past, only some techniques were able to effectively classify and segment teeth simultaneously. This article offers a pipeline of two deep l… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: submtted to Expert Systems with Applications Journal

  36. arXiv:2406.00724  [pdf, other

    cs.HC cs.RO

    Exploring Child-Robot Interaction in Individual and Group settings in India

    Authors: Gayathri Manikutty, Sai Ankith Potapragada, Devasena Pasupuleti, Mahesh S. Unnithan, Arjun Venugopal, Pranav Prabha, Arunav H., Vyshnavi Anil Kumar, Rthuraj P. R., Rao R Bhavani

    Abstract: This study evaluates the effectiveness of child-robot interactions with the HaKsh-E social robot in India, examining both individual and group interaction settings. The research centers on game-based interactions designed to teach hand hygiene to children aged 7-11. Utilizing video analysis, rubric assessments, and post-study questionnaires, the study gathered data from 36 participants. Findings i… ▽ More

    Submitted 4 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: 6 pages, 6 figures, Accepted for presentation at ICRAS 2024 (https://www.icras.org/)

  37. arXiv:2406.00071  [pdf

    astro-ph.IM astro-ph.SR cs.LG

    Optimizing Photometric Light Curve Analysis: Evaluating Scipy's Minimize Function for Eclipse Mapping of Cataclysmic Variables

    Authors: Anoop Kumar, Madan Mohan Tito Ayyalasomayajula, Dheerendra Panwar, Yeshwanth Vasa

    Abstract: With a particular focus on Scipy's minimize function the eclipse mapping method is thoroughly researched and implemented utilizing Python and essential libraries. Many optimization techniques are used, including Sequential Least Squares Programming (SLSQP), Nelder-Mead, and Conjugate Gradient (CG). However, for the purpose of examining photometric light curves these methods seek to solve the maxim… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

  38. arXiv:2406.00010  [pdf, other

    cs.IR cs.CL

    EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search

    Authors: Kamalkumar Rathinasamy, Jayarama Nettar, Amit Kumar, Vishal Manchanda, Arun Vijayakumar, Ayush Kataria, Venkateshprasanna Manjunath, Chidambaram GS, Jaskirat Singh Sodhi, Shoeb Shaikh, Wasim Akhtar Khan, Prashant Singh, Tanishq Dattatray Ige, Vipin Tiwari, Rajab Ali Mondal, Harshini K, S Reka, Chetana Amancharla, Faiz ur Rahman, Harikrishnan P A, Indraneel Saha, Bhavya Tiwary, Navin Shankar Patel, Pradeep T S, Balaji A J , et al. (2 additional authors not shown)

    Abstract: Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components.… ▽ More

    Submitted 18 May, 2024; originally announced June 2024.

    ACM Class: I.2.7

  39. arXiv:2405.20755  [pdf

    cs.CL

    Improving code-mixed hate detection by native sample mixing: A case study for Hindi-English code-mixed scenario

    Authors: Debajyoti Mazumder, Aakash Kumar, Jasabanta Patro

    Abstract: Hate detection has long been a challenging task for the NLP community. The task becomes complex in a code-mixed environment because the models must understand the context and the hate expressed through language alteration. Compared to the monolingual setup, we see very less work on code-mixed hate as large-scale annotated hate corpora are unavailable to make the study. To overcome this bottleneck,… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Generated from XeLaTeX

  40. arXiv:2405.20402  [pdf, other

    eess.AS cs.SD eess.SP

    Cross-Talk Reduction

    Authors: Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe

    Abstract: While far-field multi-talker mixtures are recorded, each speaker can wear a close-talk microphone so that close-talk mixtures can be recorded at the same time. Although each close-talk mixture has a high signal-to-noise ratio (SNR) of the wearer, it has a very limited range of applications, as it also contains significant cross-talk speech by other speakers and is not clean enough. In this context… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: in International Joint Conference on Artificial Intelligence (IJCAI), 2024

  41. arXiv:2405.19815  [pdf, other

    cs.AI cs.LG

    Efficient Stimuli Generation using Reinforcement Learning in Design Verification

    Authors: Deepak Narayan Gadde, Thomas Nalapat, Aman Kumar, Djones Lettnin, Wolfgang Kunz, Sebastian Simon

    Abstract: The increasing design complexity of System-on-Chips (SoCs) has led to significant verification challenges, particularly in meeting coverage targets within a timely manner. At present, coverage closure is heavily dependent on constrained random and coverage driven verification methodologies where the randomized stimuli are bounded to verify certain scenarios and to reach coverage goals. This proces… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at the 20th International Conference on Synthesis, Modeling, Analysis and Simulation Methods, and Applications to Circuit Design (SMACD'24), Jul 2-5 2024, Volos, Greece

  42. arXiv:2405.18304  [pdf, other

    cs.CV

    Multi-modal Generation via Cross-Modal In-Context Learning

    Authors: Amandeep Kumar, Muzammal Naseer, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal

    Abstract: In this work, we study the problem of generating novel images from complex multimodal prompt sequences. While existing methods achieve promising results for text-to-image generation, they often struggle to capture fine-grained details from lengthy prompts and maintain contextual coherence within prompt sequences. Moreover, they often result in misaligned image generation for prompt sequences featu… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Technical Report

  43. arXiv:2405.17401  [pdf, other

    cs.LG cs.CV stat.ML

    RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

    Authors: Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

    Abstract: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of styl… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Preprint. Under review

  44. arXiv:2405.16282  [pdf, other

    cs.CL cs.AI cs.LG

    Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models

    Authors: Abhishek Kumar, Robert Morabito, Sanzhar Umbet, Jad Kabbara, Ali Emami

    Abstract: As the use of Large Language Models (LLMs) becomes more widespread, understanding their self-evaluation of confidence in generated responses becomes increasingly important as it is integral to the reliability of the output of these models. We introduce the concept of Confidence-Probability Alignment, that connects an LLM's internal confidence, quantified by token probabilities, to the confidence c… ▽ More

    Submitted 15 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 pages (excluding references), accepted to ACL 2024 Main Conference

  45. arXiv:2405.14555  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models

    Authors: Abhishek Kumar, Sarfaroz Yunusov, Ali Emami

    Abstract: Research on Large Language Models (LLMs) has often neglected subtle biases that, although less apparent, can significantly influence the models' outputs toward particular social narratives. This study addresses two such biases within LLMs: representative bias, which denotes a tendency of LLMs to generate outputs that mirror the experiences of certain identity groups, and affinity bias, reflecting… ▽ More

    Submitted 3 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 9 pages (excluding references), accepted to ACL 2024 Main Conference

  46. arXiv:2405.09288  [pdf, other

    cs.CV

    DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual Explanations

    Authors: Nima Fathi, Amar Kumar, Brennan Nichyporuk, Mohammad Havaei, Tal Arbel

    Abstract: Deep learning classifiers are prone to latching onto dominant confounders present in a dataset rather than on the causal markers associated with the target class, leading to poor generalization and biased predictions. Although explainability via counterfactual image generation has been successful at exposing the problem, bias mitigation strategies that permit accurate explainability in the presenc… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to Medical Imaging with Deep Learning (MIDL) 2024

  47. arXiv:2405.08015  [pdf, other

    cs.LG cs.AI

    A Methodology-Oriented Study of Catastrophic Forgetting in Incremental Deep Neural Networks

    Authors: Ashutosh Kumar, Sonali Agarwal, D Jude Hemanth

    Abstract: Human being and different species of animals having the skills to gather, transferring knowledge, processing, fine-tune and generating information throughout their lifetime. The ability of learning throughout their lifespan is referred as continuous learning which is using neurocognition mechanism. Consequently, in real world computational system of incremental learning autonomous agents also need… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  48. arXiv:2405.07079  [pdf, other

    cs.SE

    Host-Based Allocators for Device Memory

    Authors: Oren Bell, Ashwin Kumar, Chris Gill

    Abstract: Memory allocation is a fairly mature field of computer science. However, we challenge a prevailing assumption in the literature over the last 50 years which, if reconsidered, necessitates a fundamental reevaluation of many classical memory management algorithms. We pose a model where the allocation algorithm runs on host memory but allocates device memory and so incur the following constraint: the… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  49. arXiv:2405.05243  [pdf, other

    quant-ph cs.LG physics.comp-ph

    Deep learning-based variational autoencoder for classification of quantum and classical states of light

    Authors: Mahesh Bhupati, Abhishek Mall, Anshuman Kumar, Pankaj K. Jha

    Abstract: Advancements in optical quantum technologies have been enabled by the generation, manipulation, and characterization of light, with identification based on its photon statistics. However, characterizing light and its sources through single photon measurements often requires efficient detectors and longer measurement times to obtain high-quality photon statistics. Here we introduce a deep learning-… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  50. arXiv:2405.03948  [pdf, other

    cs.IR cs.HC

    The Fault in Our Recommendations: On the Perils of Optimizing the Measurable

    Authors: Omar Besbes, Yash Kanoria, Akshit Kumar

    Abstract: Recommendation systems are widespread, and through customized recommendations, promise to match users with options they will like. To that end, data on engagement is collected and used. Most recommendation systems are ranking-based, where they rank and recommend items based on their predicted engagement. However, the engagement signals are often only a crude proxy for utility, as data on the latte… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.