Zum Hauptinhalt springen

Showing 1–50 of 464 results for author: Vinay

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17095  [pdf, other

    cs.CV cs.LG

    RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance

    Authors: Avideep Mukherjee, Soumya Banerjee, Vinay P. Namboodiri, Piyush Rai

    Abstract: Diffusion-based models demonstrate impressive generation capabilities. However, they also have a massive number of parameters, resulting in enormous model sizes, thus making them unsuitable for deployment on resource-constraint devices. Block-wise generation can be a promising alternative for designing compact-sized (parameter-efficient) deep generative models since the model can generate one bloc… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.13714  [pdf, other

    cs.CV

    TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation

    Authors: Jack Saunders, Vinay Namboodiri

    Abstract: Speech-driven facial animation is important for many applications including TV, film, video games, telecommunication and AR/VR. Recently, transformers have been shown to be extremely effective for this task. However, we identify two issues with the existing transformer-based models. Firstly, they are difficult to adapt to new personalised speaking styles and secondly, they are slow to run for long… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  3. arXiv:2408.09031  [pdf

    cs.NI

    A Primer on Generative AI for Telecom: From Theory to Practice

    Authors: Xingqin Lin, Lopamudra Kundu, Chris Dick, Maria Amparo Canaveras Galdon, Janaki Vamaraju, Swastika Dutta, Vinay Raman

    Abstract: The rise of generative artificial intelligence (GenAI) is transforming the telecom industry. GenAI models, particularly large language models (LLMs), have emerged as powerful tools capable of driving innovation, improving efficiency, and delivering superior customer services in telecom. This paper provides an overview of GenAI for telecom from theory to practice. We review GenAI models and discuss… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 7 pages, 6 figures, submitted for possible publication

  4. arXiv:2408.07448  [pdf, other

    cs.CL cs.AI

    LiveFC: A System for Live Fact-Checking of Audio Streams

    Authors: Venktesh V, Vinay Setty

    Abstract: The advances in the digital era have led to rapid dissemination of information. This has also aggravated the spread of misinformation and disinformation. This has potentially serious consequences, such as civil unrest. While fact-checking aims to combat this, manual fact-checking is cumbersome and not scalable. While automated fact-checking approaches exist, they do not operate in real-time and do… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Under Review, 11 pages

  5. arXiv:2408.01118  [pdf, other

    cs.CL

    IAI Group at CheckThat! 2024: Transformer Models and Data Augmentation for Checkworthy Claim Detection

    Authors: Peter Røysland Aarnes, Vinay Setty, Petra Galuščáková

    Abstract: This paper describes IAI group's participation for automated check-worthiness estimation for claims, within the framework of the 2024 CheckThat! Lab "Task 1: Check-Worthiness Estimation". The task involves the automated detection of check-worthy claims in English, Dutch, and Arabic political debates and Twitter data. We utilized various pre-trained generative decoder and encoder transformer models… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted to CLEF2024 CheckThat!

  6. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  7. QuestGen: Effectiveness of Question Generation Methods for Fact-Checking Applications

    Authors: Ritvik Setty, Vinay Setty

    Abstract: Verifying fact-checking claims poses a significant challenge, even for humans. Recent approaches have demonstrated that decomposing claims into relevant questions to gather evidence enhances the efficiency of the fact-checking process. In this paper, we provide empirical evidence showing that this question decomposition can be effectively automated. We demonstrate that smaller generative models, f… ▽ More

    Submitted 1 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted in CIKM 2024 as a short paper 4 pages and 1 page references. Fixed typo in author name

    ACM Class: H.3.3

  8. arXiv:2407.18416  [pdf, other

    cs.CL cs.AI cs.LG

    PersonaGym: Evaluating Persona Agents and LLMs

    Authors: Vinay Samuel, Henry Peng Zou, Yue Zhou, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Ameet Deshpande, Karthik Narasimhan, Vishvak Murahari

    Abstract: Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona agents offer significant enhancements across diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses to different user requirements thereby broadening the… ▽ More

    Submitted 28 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: 21 pages, 5 figures

  9. arXiv:2407.14407  [pdf, other

    quant-ph cs.NI

    Routing in Quantum Networks with End-to-End Knowledge

    Authors: Vinay Kumar, Claudio Cicconetti, Marco Conti, Andrea Passarella

    Abstract: Given the diverse array of physical systems available for quantum computing and the absence of a well-defined quantum internet protocol stack, the design and optimisation of quantum networking protocols remain largely unexplored. To address this, we introduce an approach that facilitates the establishment of paths capable of delivering end-to-end fidelity above a specified threshold, without requi… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 15 pages, 17 figures

  10. arXiv:2407.13071  [pdf

    cs.CY cs.CL cs.IR cs.LG cs.SI

    Analysing the Public Discourse around OpenAI's Text-To-Video Model 'Sora' using Topic Modeling

    Authors: Vatsal Vinay Parikh

    Abstract: The recent introduction of OpenAI's text-to-video model Sora has sparked widespread public discourse across online communities. This study aims to uncover the dominant themes and narratives surrounding Sora by conducting topic modeling analysis on a corpus of 1,827 Reddit comments from five relevant subreddits (r/OpenAI, r/technology, r/singularity, r/vfx, and r/ChatGPT). The comments were collect… ▽ More

    Submitted 29 May, 2024; originally announced July 2024.

  11. arXiv:2407.03122  [pdf, other

    cs.RO

    IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale

    Authors: Wei Gao, Bo Ai, Joel Loo, Vinay, David Hsu

    Abstract: This work explores the challenges of creating a scalable and robust robot navigation system that can traverse both indoor and outdoor environments to reach distant goals. We propose a navigation system architecture called IntentionNet that employs a monolithic neural network as the low-level planner/controller, and uses a general interface that we call intentions to steer the controller. The paper… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  12. arXiv:2406.14201  [pdf, other

    cs.CV

    Trusting Semantic Segmentation Networks

    Authors: Samik Some, Vinay P. Namboodiri

    Abstract: Semantic segmentation has become an important task in computer vision with the growth of self-driving cars, medical image segmentation, etc. Although current models provide excellent results, they are still far from perfect and while there has been significant work in trying to improve the performance, both with respect to accuracy and speed of segmentation, there has been little work which analys… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  13. arXiv:2406.10892  [pdf, other

    cs.LG

    DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

    Authors: Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler, Vinay P Namboodiri, Amrit Singh Bedi

    Abstract: Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, whil… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  14. arXiv:2406.10247  [pdf, other

    cs.CL cs.AI

    QCQA: Quality and Capacity-aware grouped Query Attention

    Authors: Vinay Joshi, Prashant Laddha, Shambhavi Sinha, Om Ji Omer, Sreenivas Subramoney

    Abstract: Excessive memory requirements of key and value features (KV-cache) present significant challenges in the autoregressive inference of large language models (LLMs), restricting both the speed and length of text generation. Approaches such as Multi-Query Attention (MQA) and Grouped Query Attention (GQA) mitigate these challenges by grouping query heads and consequently reducing the number of correspo… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  15. arXiv:2406.05881  [pdf, other

    cs.LG cs.CL cs.RO

    LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

    Authors: Utsav Singh, Pramit Bhattacharyya, Vinay P. Namboodiri

    Abstract: Developing interactive systems that leverage natural language instructions to solve complex robotic control tasks has been a long-desired goal in the robotics community. Large Language Models (LLMs) have demonstrated exceptional abilities in handling complex tasks, including logical reasoning, in-context learning, and code generation. However, predicting low-level robotic actions using LLMs poses… ▽ More

    Submitted 16 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  16. arXiv:2406.02184  [pdf, other

    cs.CV

    GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon

    Authors: Sanhita Pathak, Vinay Kaushik, Brejesh Lall

    Abstract: Virtual try-on, a rapidly evolving field in computer vision, is transforming e-commerce by improving customer experiences through precise garment warping and seamless integration onto the human body. While existing methods such as TPS and flow address the garment warping but overlook the finer contextual details. In this paper, we introduce a novel graph based warping technique which emphasizes th… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 Figures and 6 Tables

  17. arXiv:2406.00968  [pdf, other

    cs.RO cs.HC

    Evaluating MEDIRL: A Replication and Ablation Study of Maximum Entropy Deep Inverse Reinforcement Learning for Human Social Navigation

    Authors: Vinay Gupta, Nihal Gunukula

    Abstract: In this study, we enhance the Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) framework, targeting its application in human robot interaction (HRI) for modeling pedestrian behavior in crowded environments. Our work is grounded in the pioneering research by Fahad, Chen, and Guo, and aims to elevate MEDIRL's efficacy in real world HRI settings. We replicated the original MEDIRL model an… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 14 pages, 13 figures

  18. arXiv:2405.15065  [pdf, other

    cs.LG

    Direct Preference Optimization With Unobserved Preference Heterogeneity

    Authors: Keertana Chidambaram, Karthik Vinay Seetharaman, Vasilis Syrgkanis

    Abstract: RLHF has emerged as a pivotal step in aligning language models with human objectives and values. It typically involves learning a reward model from human preference data and then using reinforcement learning to update the generative model accordingly. Conversely, Direct Preference Optimization (DPO) directly optimizes the generative model with preference data, skipping reinforcement learning. Howe… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  19. arXiv:2405.11659  [pdf, other

    cs.RO cs.CV cs.LG

    Auto-Platoon : Freight by example

    Authors: Tharun V. Puthanveettil, Abhijay Singh, Yashveer Jain, Vinay Bukka, Sameer Arjun S

    Abstract: The work introduces a bio-inspired leader-follower system based on an innovative mechanism proposed as software latching that aims to improve collaboration and coordination between a leader agent and the associated autonomous followers. The system utilizes software latching to establish real-time communication and synchronization between the leader and followers. A layered architecture is proposed… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  20. arXiv:2405.07166  [pdf, other

    cs.CV

    Resource Efficient Perception for Vision Systems

    Authors: A V Subramanyam, Niyati Singal, Vinay K Verma

    Abstract: Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  21. The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models

    Authors: Abeba Birhane, Sepehr Dehdashtian, Vinay Uday Prabhu, Vishnu Boddeti

    Abstract: Scale the model, scale the data, scale the GPU farms is the reigning sentiment in the world of generative AI today. While model scaling has been extensively studied, data scaling and its downstream impacts on model performance remain under-explored. This is particularly important in the context of multimodal datasets whose main source is the World Wide Web, condensed and packaged as the Common Cra… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: To appear in the proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT 24), June 3 to 6, 2024, Rio de Janeiro, Brazil. arXiv admin note: text overlap with arXiv:2306.13141

  22. arXiv:2404.19482  [pdf, other

    cs.CL

    FactCheck Editor: Multilingual Text Editor with End-to-End fact-checking

    Authors: Vinay Setty

    Abstract: We introduce 'FactCheck Editor', an advanced text editor designed to automate fact-checking and correct factual inaccuracies. Given the widespread issue of misinformation, often a result of unintentional mistakes by content creators, our tool aims to address this challenge. It supports over 90 languages and utilizes transformer models to assist humans in the labor-intensive process of fact verific… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted in SIGIR 2024 (demo track)

  23. arXiv:2404.19341  [pdf, other

    cs.CV cs.AI

    Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs

    Authors: Soham Mitra, Atri Sukul, Swalpa Kumar Roy, Pravendra Singh, Vinay Verma

    Abstract: Deep learning models have achieved remarkable success across diverse domains. However, the intricate nature of these models often impedes a clear understanding of their decision-making processes. This is where Explainable AI (XAI) becomes indispensable, offering intuitive explanations for model decisions. In this work, we propose a simple yet highly effective approach, ScoreCAM++, which introduces… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  24. arXiv:2404.15592  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction

    Authors: Henry Peng Zou, Vinay Samuel, Yue Zhou, Weizhi Zhang, Liancheng Fang, Zihe Song, Philip S. Yu, Cornelia Caragea

    Abstract: Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction.… ▽ More

    Submitted 19 July, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by ACL 2024 (Findings) - Scores: Soundness - 4/4/4, Dataset - 4/4/4, Overall Assessment - 4/3.5/3.5, Meta - 4

  25. arXiv:2404.13423  [pdf, other

    cs.LG

    PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

    Authors: Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi

    Abstract: In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mit… ▽ More

    Submitted 16 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  26. arXiv:2404.09136  [pdf, other

    cs.CL cs.AI cs.LG

    TLDR at SemEval-2024 Task 2: T5-generated clinical-Language summaries for DeBERTa Report Analysis

    Authors: Spandan Das, Vinay Samuel, Shahriar Noroozizadeh

    Abstract: This paper introduces novel methodologies for the Natural Language Inference for Clinical Trials (NLI4CT) task. We present TLDR (T5-generated clinical-Language summaries for DeBERTa Report Analysis) which incorporates T5-model generated premise summaries for improved entailment and contradiction analysis in clinical NLI tasks. This approach overcomes the challenges posed by small context windows a… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Journal ref: In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 507-516, Mexico City, Mexico. Association for Computational Linguistics

  27. Accessibility in Information Retrieval

    Authors: Leif Azzopardi, Vishwa Vinay

    Abstract: This paper introduces the concept of accessibility from the field of transportation planning and adopts it within the context of Information Retrieval (IR). An analogy is drawn between the fields, which motivates the development of document accessibility measures for IR systems. Considering the accessibility of documents within a collection given an IR System provides a different perspective on th… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Journal ref: European Conference in Information Retrieval (ECIR) 2008

  28. arXiv:2404.03612  [pdf, other

    cs.HC

    Creator Hearts: Investigating the Impact Positive Signals from YouTube Creators in Shaping Comment Section Behavior

    Authors: Frederick Choi, Charlotte Lambert, Vinay Koshy, Sowmya Pratipati, Tue Do, Eshwar Chandrasekharan

    Abstract: Much of the research in online moderation focuses on punitive actions. However, emerging research has shown that positive reinforcement is effective at encouraging desirable behavior on online platforms. We extend this research by studying the "creator heart" feature on YouTube, quantifying their primary effects on comments that receive hearts and on videos where hearts have been given. We find th… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  29. arXiv:2404.02587  [pdf, ps, other

    cs.IR cs.AI

    The Surprising Effectiveness of Rankers Trained on Expanded Queries

    Authors: Abhijit Anand, Venktesh V, Vinay Setty, Avishek Anand

    Abstract: An important problem in text-ranking systems is handling the hard queries that form the tail end of the query distribution. The difficulty may arise due to the presence of uncommon, underspecified, or incomplete queries. In this work, we improve the ranking performance of hard or difficult queries without compromising the performance of other queries. Firstly, we do LLM based query enrichment for… ▽ More

    Submitted 12 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  30. arXiv:2403.20317  [pdf, other

    cs.CV

    Convolutional Prompting meets Language Models for Continual Learning

    Authors: Anurag Roy, Riddhiman Moulick, Vinay K. Verma, Saptarshi Ghosh, Abir Das

    Abstract: Continual Learning (CL) enables machine learning models to learn from continuously shifting new training data in absence of data from old tasks. Recently, pretrained vision transformers combined with prompt tuning have shown promise for overcoming catastrophic forgetting in CL. These approaches rely on a pool of learnable prompts which can be inefficient in sharing knowledge across tasks leading t… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 Camera Ready

  31. arXiv:2403.18063  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis

    Authors: Badri N. Patro, Suhas Ranganath, Vinay P. Namboodiri, Vijay S. Agneeswaran

    Abstract: Transformers have revolutionized image modeling tasks with adaptations like DeIT, Swin, SVT, Biformer, STVit, and FDVIT. However, these models often face challenges with inductive bias and high quadratic complexity, making them less efficient for high-resolution images. State space models (SSMs) such as Mamba, V-Mamba, ViM, and SiMBA offer an alternative to handle high resolution images in compute… ▽ More

    Submitted 3 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  32. arXiv:2403.17169  [pdf, other

    cs.CL cs.AI

    QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims

    Authors: Venktesh V, Abhijit Anand, Avishek Anand, Vinay Setty

    Abstract: Automated fact checking has gained immense interest to tackle the growing misinformation in the digital era. Existing systems primarily focus on synthetic claims on Wikipedia, and noteworthy progress has also been made on real-world claims. In this work, we release QuanTemp, a diverse, multi-domain dataset focused exclusively on numerical claims, encompassing temporal, statistical and diverse aspe… ▽ More

    Submitted 1 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: 11 pages, 1 figure,Accepted for publication at the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)

  33. arXiv:2403.13108  [pdf, ps, other

    cs.LG cs.CR cs.DC eess.SP

    Resilience in Online Federated Learning: Mitigating Model-Poisoning Attacks via Partial Sharing

    Authors: Ehsan Lari, Reza Arablouei, Vinay Chakravarthi Gogineni, Stefan Werner

    Abstract: Federated learning (FL) allows training machine learning models on distributed data without compromising privacy. However, FL is vulnerable to model-poisoning attacks where malicious clients tamper with their local models to manipulate the global model. In this work, we investigate the resilience of the partial-sharing online FL (PSO-Fed) algorithm against such attacks. PSO-Fed reduces communicati… ▽ More

    Submitted 16 August, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 13 pages, 9 figures, Submitted to TSIPN

  34. arXiv:2403.10682  [pdf

    cond-mat.mtrl-sci cs.LG

    Evaluation of GlassNet for physics-informed machine learning of glass stability and glass-forming ability

    Authors: Sarah I. Allec, Xiaonan Lu, Daniel R. Cassar, Xuan T. Nguyen, Vinay I. Hegde, Thiruvillamalai Mahadevan, Miroslava Peterson, Jincheng Du, Brian J. Riley, John D. Vienna, James E. Saal

    Abstract: Glasses form the basis of many modern applications and also hold great potential for future medical and environmental applications. However, their structural complexity and large composition space make design and optimization challenging for certain applications. Of particular importance for glass processing is an estimate of a given composition's glass-forming ability (GFA). However, there remain… ▽ More

    Submitted 19 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  35. arXiv:2403.07611  [pdf, other

    cs.LG cs.AI

    Efficient Knowledge Deletion from Trained Models through Layer-wise Partial Machine Unlearning

    Authors: Vinay Chakravarthi Gogineni, Esmaeil S. Nadimi

    Abstract: Machine unlearning has garnered significant attention due to its ability to selectively erase knowledge obtained from specific training data samples in an already trained machine learning model. This capability enables data holders to adhere strictly to data protection regulations. However, existing unlearning techniques face practical constraints, often causing performance degradation, demanding… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 16pages, 4 figures

  36. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  37. arXiv:2403.03550  [pdf

    cs.AI cs.CY cs.HC

    Emotional Manipulation Through Prompt Engineering Amplifies Disinformation Generation in AI Large Language Models

    Authors: Rasita Vinay, Giovanni Spitale, Nikola Biller-Andorno, Federico Germani

    Abstract: This study investigates the generation of synthetic disinformation by OpenAI's Large Language Models (LLMs) through prompt engineering and explores their responsiveness to emotional prompting. Leveraging various LLM iterations using davinci-002, davinci-003, gpt-3.5-turbo and gpt-4, we designed experiments to assess their success in producing disinformation. Our findings, based on a corpus of 19,8… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 14 pages, 3 figures

  38. arXiv:2403.01087  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Towards Accurate Lip-to-Speech Synthesis in-the-Wild

    Authors: Sindhu Hegde, Rudrabha Mukhopadhyay, C. V. Jawahar, Vinay Namboodiri

    Abstract: In this paper, we introduce a novel approach to address the task of synthesizing speech from silent videos of any in-the-wild speaker solely based on lip movements. The traditional approach of directly generating speech from lip videos faces the challenge of not being able to learn a robust language model from speech alone, resulting in unsatisfactory outcomes. To overcome this issue, we propose i… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 8 pages of content, 1 page of references and 4 figures

    Journal ref: In Proceedings of the 31st ACM International Conference on Multimedia, 2023

  39. arXiv:2402.17424  [pdf

    cs.CV

    ViTaL: An Advanced Framework for Automated Plant Disease Identification in Leaf Images Using Vision Transformers and Linear Projection For Feature Reduction

    Authors: Abhishek Sebastian, Annis Fathima A, Pragna R, Madhan Kumar S, Yaswanth Kannan G, Vinay Murali

    Abstract: Our paper introduces a robust framework for the automated identification of diseases in plant leaf images. The framework incorporates several key stages to enhance disease recognition accuracy. In the pre-processing phase, a thumbnail resizing technique is employed to resize images, minimizing the loss of critical image details while ensuring computational efficiency. Normalization procedures are… ▽ More

    Submitted 27 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted and scheduled for presentation at CML 2024, this work will be published as a book chapter in Lecture Notes in Networks and Systems

  40. arXiv:2402.13192  [pdf, other

    math.PR cs.PF

    Spatial Queues with Nearest Neighbour Shifts

    Authors: B. R. Vinay Kumar, Lasse Leskelä

    Abstract: In this work we study multi-server queues on a Euclidean space. Consider $N$ servers that are distributed uniformly in $[0,1]^d$. Customers (users) arrive at the servers according to independent Poisson processes of intensity $λ$. However, they probabilistically decide whether to join the queue they arrived at, or move to one of the nearest neighbours. The strategy followed by the customers affect… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: A part of this work was accepted to the conference International Teletraffic Congress (ITC 35) held between 3--5 October 2023 in Turin, Italy

    MSC Class: 60K30; 05C80

  41. arXiv:2402.12147  [pdf, other

    cs.CL cs.AI

    Surprising Efficacy of Fine-Tuned Transformers for Fact-Checking over Larger Language Models

    Authors: Vinay Setty

    Abstract: In this paper, we explore the challenges associated with establishing an end-to-end fact-checking pipeline in a real-world context, covering over 90 languages. Our real-world experimental benchmarks demonstrate that fine-tuning Transformer models specifically for fact-checking tasks, such as claim detection and veracity prediction, provide superior performance over large language models (LLMs) lik… ▽ More

    Submitted 30 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted in SIGIR 2024 (industry track)

  42. arXiv:2402.11780  [pdf, other

    cs.AR cs.AI

    CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware

    Authors: Souvik Kundu, Anthony Sarah, Vinay Joshi, Om J Omer, Sreenivas Subramoney

    Abstract: With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may no… ▽ More

    Submitted 18 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 6 pages, 4 figures, 5 tables; Accepted as a full paper by the tinyML Research Symposium 2024

  43. arXiv:2402.11750  [pdf, other

    cs.CL

    In-Context Learning Demonstration Selection via Influence Analysis

    Authors: Vinay M. S., Minh-Hao Van, Xintao Wu

    Abstract: Large Language Models (LLMs) have showcased their In-Context Learning (ICL) capabilities, enabling few-shot learning without the need for gradient updates. Despite its advantages, the effectiveness of ICL heavily depends on the choice of demonstrations. Selecting the most effective demonstrations for ICL remains a significant research challenge. To tackle this issue, we propose a demonstration sel… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 11 pages, 1 figure, and 6 tables

  44. arXiv:2402.07540  [pdf, other

    cs.HC cs.AI cs.CL

    PKG API: A Tool for Personal Knowledge Graph Management

    Authors: Nolwenn Bernard, Ivica Kostric, Weronika Łajewska, Krisztian Balog, Petra Galuščáková, Vinay Setty, Martin G. Skjæveland

    Abstract: Personal knowledge graphs (PKGs) offer individuals a way to store and consolidate their fragmented personal data in a central place, improving service personalization while maintaining full user control. Despite their potential, practical PKG implementations with user-friendly interfaces remain scarce. This work addresses this gap by proposing a complete solution to represent, manage, and interfac… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  45. arXiv:2402.03947  [pdf, other

    cs.RO

    Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding

    Authors: Mihir Kulkarni, Kostas Alexis

    Abstract: This work contributes a novel deep navigation policy that enables collision-free flight of aerial robots based on a modular approach exploiting deep collision encoding and reinforcement learning. The proposed solution builds upon a deep collision encoder that is trained on both simulated and real depth images using supervised learning such that it compresses the high-dimensional depth data to a lo… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 8 pages, 8 figures. Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2024

  46. Aerial Field Robotics

    Authors: Mihir Kulkarni, Brady Moon, Kostas Alexis, Sebastian Scherer

    Abstract: Aerial field robotics research represents the domain of study that aims to equip unmanned aerial vehicles - and as it pertains to this chapter, specifically Micro Aerial Vehicles (MAVs)- with the ability to operate in real-life environments that present challenges to safe navigation. We present the key elements of autonomy for MAVs that are resilient to collisions and sensing degradation, while op… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted in the Encyclopedia of Robotics, Springer

  47. arXiv:2401.10681  [pdf, other

    cs.NI

    Maximizing Real-Time Video QoE via Bandwidth Sharing under Markovian setting

    Authors: Sushi Anna George, Vinay Joseph

    Abstract: We consider the problem of optimizing Quality of Experience (QoE) of clients streaming real-time video, served by networks managed by different operators that can share bandwidth with each other. The abundance of real-time video traffic is evident in the popularity of applications like video conferencing and video streaming of live events, which have increased significantly since the recent pandem… ▽ More

    Submitted 26 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.06666

  48. arXiv:2401.06126  [pdf, other

    cs.CV cs.GR

    Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors

    Authors: Jack Saunders, Vinay Namboodiri

    Abstract: Visual dubbing is the process of generating lip motions of an actor in a video to synchronise with given audio. Recent advances have made progress towards this goal but have not been able to produce an approach suitable for mass adoption. Existing methods are split into either person-generic or person-specific models. Person-specific models produce results almost indistinguishable from reality but… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  49. arXiv:2312.13333  [pdf

    eess.IV cs.CY

    Responsible Deep Learning for Software as a Medical Device

    Authors: Pratik Shah, Jenna Lester, Jana G Deflino, Vinay Pai

    Abstract: Tools, models and statistical methods for signal processing and medical image analysis and training deep learning models to create research prototypes for eventual clinical applications are of special interest to the biomedical imaging community. But material and optical properties of biological tissues are complex and not easily captured by imaging devices. Added complexity can be introduced by d… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    ACM Class: I.2; K.4.1; J.3; I.4

  50. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.