Zum Hauptinhalt springen

Showing 51–100 of 309 results for author: Manocha, D

.
  1. arXiv:2311.08740  [pdf, other

    cs.RO

    AdVENTR: Autonomous Robot Navigation in Complex Outdoor Environments

    Authors: Kasun Weerakoon, Adarsh Jagan Sathyamoorthy, Mohamed Elnoor, Dinesh Manocha

    Abstract: We present a novel system, AdVENTR for autonomous robot navigation in unstructured outdoor environments that consist of uneven and vegetated terrains. Our approach is general and can enable both wheeled and legged robots to handle outdoor terrain complexity including unevenness, surface properties like poor traction, granularity, obstacle stiffness, etc. We use data from sensors including RGB came… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  2. arXiv:2310.16255  [pdf, other

    cs.CV

    UAV-Sim: NeRF-based Synthetic Data Generation for UAV-based Perception

    Authors: Christopher Maxey, Jaehoon Choi, Hyungtae Lee, Dinesh Manocha, Heesung Kwon

    Abstract: Tremendous variations coupled with large degrees of freedom in UAV-based imaging conditions lead to a significant lack of data in adequately learning UAV-based perception models. Using various synthetic renderers in conjunction with perception models is prevalent to create synthetic data to augment the learning in the ground-based imaging domain. However, severe challenges in the austere UAV-based… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Video Link: https://www.youtube.com/watch?v=ucPzbPLqqpI

  3. arXiv:2310.15799  [pdf, other

    cs.CL cs.AI

    DALE: Generative Data Augmentation for Low-Resource Legal NLP

    Authors: Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar, S Ramaneswaran, S Sakshi, Utkarsh Tyagi, Dinesh Manocha

    Abstract: We present DALE, a novel and effective generative Data Augmentation framework for low-resource LEgal NLP. DALE addresses the challenges existing frameworks pose in generating effective data augmentations of legal documents - legal language, with its specialized vocabulary and complex semantics, morphology, and syntax, does not benefit from data augmentations that merely rephrase the source sentenc… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Main Conference. Code: https://github.com/Sreyan88/DALE

  4. arXiv:2310.15264  [pdf, other

    cs.CL cs.AI

    Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

    Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, Amrit Singh Bedi

    Abstract: Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. However, despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs such as spreading misinformation, generating fake news, plagiarism in academia, and contami… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  5. arXiv:2310.14566  [pdf, other

    cs.CV cs.CL

    HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

    Authors: Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou

    Abstract: We introduce HallusionBench, a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(Vision), Gemini Pro Vision, Claude 3, and LLaVA-1.5, by emphasizing nuanced understanding and interpretation of visual data. The benchmark comprises 346 images paired with 1129… ▽ More

    Submitted 25 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted to CVPR 2024

  6. arXiv:2310.10578  [pdf, other

    eess.SP

    Indoor Wireless Signal Modeling with Smooth Surface Diffraction Effects

    Authors: Ruichen Wang, Samuel Audia, Dinesh Manocha

    Abstract: We present a novel algorithm that enhances the accuracy of electromagnetic field simulations in indoor environments by incorporating the Uniform Geometrical Theory of Diffraction (UTD) for surface diffraction. This additional diffraction phenomenology is important for the design of modern wireless systems and allows us to capture the effects of more complex scene geometries. Central to our methodo… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 5 pages, 9 figures, conference

  7. arXiv:2310.08753  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

    Authors: Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

    Abstract: A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in many downstream applications, including zero-shot audio classification, audio retrieval, etc. However, the ability of these models to effectively perfo… ▽ More

    Submitted 30 July, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Project Page: https://sreyan88.github.io/compa_iclr/

  8. arXiv:2310.07621  [pdf, other

    cs.RO

    AG-CVG: Coverage Planning with a Mobile Recharging UGV and an Energy-Constrained UAV

    Authors: Nare Karapetyan, Ahmad Bilal Asghar, Amisha Bhaskar, Guangyao Shi, Dinesh Manocha, Pratap Tokekar

    Abstract: In this paper, we present an approach for coverage path planning for a team of an energy-constrained Unmanned Aerial Vehicle (UAV) and an Unmanned Ground Vehicle (UGV). Both the UAV and the UGV have predefined areas that they have to cover. The goal is to perform complete coverage by both robots while minimizing the coverage time. The UGV can also serve as a mobile recharging station. The UAV and… ▽ More

    Submitted 15 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: ICRA 2024 Proceedings

  9. arXiv:2310.00481  [pdf, other

    cs.RO

    LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments

    Authors: Chak Lam Shek, Xiyang Wu, Wesley A. Suttle, Carl Busart, Erin Zaroukian, Dinesh Manocha, Pratap Tokekar, Amrit Singh Bedi

    Abstract: Navigating robots through unstructured terrains is challenging, primarily due to the dynamic environmental changes. While humans adeptly navigate such terrains by using context from their observations, creating a similar context-aware navigation system for robots is difficult. The essence of the issue lies in the acquisition and interpretation of contextual information, a task complicated by the i… ▽ More

    Submitted 19 March, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

  10. arXiv:2309.09836  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    RECAP: Retrieval-Augmented Audio Captioning

    Authors: Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha

    Abstract: We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore. Additionally, our proposed method can transfer to any domain without the need for any additional fine-tuning. To generate a caption for an audio sample, we leverage an audio-t… ▽ More

    Submitted 6 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: ICASSP 2024. Code and data: https://github.com/Sreyan88/RECAP

  11. arXiv:2309.08457  [pdf, other

    cs.RO

    Sim-to-Real Brush Manipulation using Behavior Cloning and Reinforcement Learning

    Authors: Biao Jia, Dinesh Manocha

    Abstract: Developing proficient brush manipulation capabilities in real-world scenarios is a complex and challenging endeavor, with wide-ranging applications in fields such as art, robotics, and digital design. In this study, we introduce an approach designed to bridge the gap between simulated environments and real-world brush manipulation. Our framework leverages behavior cloning and reinforcement learnin… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  12. arXiv:2309.08214  [pdf, other

    cs.RO

    MTG: Mapless Trajectory Generator with Traversability Coverage for Outdoor Navigation

    Authors: Jing Liang, Peng Gao, Xuesu Xiao, Adarsh Jagan Sathyamoorthy, Mohamed Elnoor, Ming C. Lin, Dinesh Manocha

    Abstract: We present a novel learning-based trajectory generation algorithm for outdoor robot navigation. Our goal is to compute collision-free paths that also satisfy the environment-specific traversability constraints. Our approach is designed for global planning using limited onboard robot perception in mapless environments while ensuring comprehensive coverage of all traversable directions. Our formulat… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: 9

  13. arXiv:2309.07832  [pdf, other

    cs.RO cs.AI

    VAPOR: Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning

    Authors: Kasun Weerakoon, Adarsh Jagan Sathyamoorthy, Mohamed Elnoor, Dinesh Manocha

    Abstract: We present VAPOR, a novel method for autonomous legged robot navigation in unstructured, densely vegetated outdoor environments using offline Reinforcement Learning (RL). Our method trains a novel RL policy using an actor-critic network and arbitrary data collected in real outdoor vegetation. Our policy uses height and intensity-based cost maps derived from 3D LiDAR point clouds, a goal cost map,… ▽ More

    Submitted 19 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

  14. arXiv:2309.07014  [pdf, other

    cs.RO

    Using Lidar Intensity for Robot Navigation

    Authors: Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Mohamed Elnoor, Dinesh Manocha

    Abstract: We present Multi-Layer Intensity Map, a novel 3D object representation for robot perception and autonomous navigation. Intensity maps consist of multiple stacked layers of 2D grid maps each derived from reflected point cloud intensities corresponding to a certain height interval. The different layers of intensity maps can be used to simultaneously estimate obstacles' height, solidity/density, and… ▽ More

    Submitted 28 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: 9 pages, 7 figures

  15. arXiv:2308.12370  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    AdVerb: Visually Guided Audio Dereverberation

    Authors: Sanjoy Chowdhury, Sreyan Ghosh, Subhrajyoti Dasgupta, Anton Ratnarajah, Utkarsh Tyagi, Dinesh Manocha

    Abstract: We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio. Although audio-only dereverberation is a well-studied problem, our approach incorporates the complementary visual modality to perform audio dereverberation. Given an image of the environment where the reverberated sound signal has been recorded, AdVe… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023. For project page, see https://gamma.umd.edu/researchdirections/speech/adverb

  16. arXiv:2308.10103  [pdf, other

    cs.CV cs.AI cs.CL

    ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations

    Authors: Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Sakshi Singh, Sanjoy Chowdhury, Dinesh Manocha

    Abstract: Neural image classifiers can often learn to make predictions by overly relying on non-predictive features that are spuriously correlated with the class labels in the training data. This leads to poor performance in real-world atypical scenarios where such features are absent. This paper presents ASPIRE (Language-guided Data Augmentation for SPurIous correlation REmoval), a simple yet effective sol… ▽ More

    Submitted 6 June, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

    Comments: ACL 2024 Findings. Code: https://github.com/Sreyan88/ASPIRE

  17. arXiv:2308.02585  [pdf, other

    cs.LG

    PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

    Abstract: We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback. We identify a major gap within current algorithmic designs for solving policy alignment due to a lack of precise characterization of the dependence of the alignment obj… ▽ More

    Submitted 30 April, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

  18. arXiv:2307.12217  [pdf, other

    cs.CV

    LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference

    Authors: Cong Wang, Yu-Ping Wang, Dinesh Manocha

    Abstract: We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately, thus generating better novel views. Without the depth information, regressing appropriate plane locations is a challenging problem. To solve this issue, we pre-partition the disparity space into bins and design a disparity sampler to regress local offsets for multiple pl… ▽ More

    Submitted 9 August, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023

  19. arXiv:2307.09754  [pdf, other

    cs.RO

    ProNav: Proprioceptive Traversability Estimation for Legged Robot Navigation in Outdoor Environments

    Authors: Mohamed Elnoor, Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Dinesh Manocha

    Abstract: We propose a novel method, ProNav, which uses proprioceptive signals for traversability estimation in challenging outdoor terrains for autonomous legged robot navigation. Our approach uses sensor data from a legged robot's joint encoders, force, and current sensors to measure the joint positions, forces, and current consumption respectively to accurately assess a terrain's stability, resistance to… ▽ More

    Submitted 26 January, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

  20. arXiv:2307.01817  [pdf, other

    cs.CV cs.AI

    Human Trajectory Forecasting with Explainable Behavioral Uncertainty

    Authors: Jiangbei Yue, Dinesh Manocha, He Wang

    Abstract: Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars, and therefore has been heavily investigated. Most existing methods can be divided into model-free and model-based methods. Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cann… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  21. arXiv:2306.06236  [pdf, other

    cs.MA cs.LG cs.RO

    iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

    Authors: Xiyang Wu, Rohan Chandra, Tianrui Guan, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm that can predict trajectories and intents in dense and heterogeneous traffic scenarios. Our approach for… ▽ More

    Submitted 21 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  22. arXiv:2306.06192  [pdf, other

    cs.RO cs.AI cs.LG

    Ada-NAV: Adaptive Trajectory Length-Based Sample Efficient Policy Learning for Robotic Navigation

    Authors: Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in roboti… ▽ More

    Submitted 14 July, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 11 pages, 9 figures, 2 tables

  23. arXiv:2306.01974  [pdf, other

    cs.SD eess.AS

    BEDRF: Bidirectional Edge Diffraction Response Function for Interactive Sound Propagation

    Authors: Chunxiao Cao, Zili An, Zhong Ren, Dinesh Manocha, Kun Zhou

    Abstract: We introduce bidirectional edge diffraction response function (BEDRF), a new approach to model wave diffraction around edges with path tracing. The diffraction part of the wave is expressed as an integration on path space, and the wave-edge interaction is expressed using only the localized information around points on the edge similar to a bidirectional scattering distribution function (BSDF) for… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  24. arXiv:2306.00928  [pdf, other

    cs.CL cs.AI cs.IR

    ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER

    Authors: Sreyan Ghosh, Utkarsh Tyagi, Manan Suri, Sonal Kumar, S Ramaneswaran, Dinesh Manocha

    Abstract: Complex Named Entity Recognition (NER) is the task of detecting linguistically complex named entities in low-context text. In this paper, we present ACLM Attention-map aware keyword selection for Conditional Language Model fine-tuning), a novel data augmentation approach based on conditional generation to address the data scarcity problem in low-resource complex NER. ACLM alleviates the context-en… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ACL 2023 Main Conference

  25. arXiv:2305.12437  [pdf, other

    cs.CV

    SCP: Soft Conditional Prompt Learning for Aerial Video Action Recognition

    Authors: Xijun Wang, Ruiqi Xian, Tianrui Guan, Fuxiao Liu, Dinesh Manocha

    Abstract: We present a new learning approach, Soft Conditional Prompt Learning (SCP), which leverages the strengths of prompt learning for aerial video action recognition. Our approach is designed to predict the action of each agent by helping the models focus on the descriptions or instructions associated with actions in the input videos for aerial/robot visual perception. Our formulation supports various… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: IROS2024

  26. arXiv:2305.10647  [pdf, other

    cs.CL cs.AI cs.IR

    BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER

    Authors: Sreyan Ghosh, Utkarsh Tyagi, Sonal Kumar, Dinesh Manocha

    Abstract: Biomedical Named Entity Recognition (BioNER) is the fundamental task of identifying named entities from biomedical text. However, BioNER suffers from severe data scarcity and lacks high-quality labeled data due to the highly specialized and expert knowledge required for annotation. Though data augmentation has shown to be highly effective for low-resource NER in general, existing data augmentation… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: SIGIR 2023

  27. arXiv:2304.06866  [pdf, other

    cs.CV

    PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition

    Authors: Ruiqi Xian, Xijun Wang, Divya Kothandaraman, Dinesh Manocha

    Abstract: We present a new algorithm for selection of informative frames in video action recognition. Our approach is designed for aerial videos captured using a moving camera where human actors occupy a small spatial resolution of video frames. Our algorithm utilizes the motion bias within aerial videos, which enables the selection of motion-salient frames. We introduce the concept of patch mutual informat… ▽ More

    Submitted 15 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  28. arXiv:2304.04736  [pdf, other

    cs.CL cs.AI cs.LG

    On the Possibilities of AI-Generated Text Detection

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang

    Abstract: Our work addresses the critical issue of distinguishing text generated by Large Language Models (LLMs) from human-produced text, a task essential for numerous applications. Despite ongoing debate about the feasibility of such differentiation, we present evidence supporting its consistent achievability, except when human and machine text distributions are indistinguishable across their entire suppo… ▽ More

    Submitted 2 October, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

  29. arXiv:2303.17778  [pdf, other

    cs.CV

    CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition

    Authors: Tianrui Guan, Aswath Muthuselvam, Montana Hoover, Xijun Wang, Jing Liang, Adarsh Jagan Sathyamoorthy, Damon Conover, Dinesh Manocha

    Abstract: We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representati… ▽ More

    Submitted 29 September, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

  30. arXiv:2303.15060  [pdf, other

    cs.CV

    TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

    Authors: Jaehoon Choi, Dongki Jung, Taejae Lee, Sangwook Kim, Youngdong Jung, Dinesh Manocha, Donghwan Lee

    Abstract: We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps and refines camera poses guided by corresponding depth. Then, we adopt the neural implicit surface reconstruction method, which allows for high-qu… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR23. Project Page: https://jh-choi.github.io/TMO/

  31. arXiv:2303.14502  [pdf, other

    cs.RO

    VERN: Vegetation-aware Robot Navigation in Dense Unstructured Outdoor Environments

    Authors: Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Tianrui Guan, Mason Russell, Damon Conover, Jason Pusey, Dinesh Manocha

    Abstract: We propose a novel method for autonomous legged robot navigation in densely vegetated environments with a variety of pliable/traversable and non-pliable/untraversable vegetation. We present a novel few-shot learning classifier that can be trained on a few hundred RGB images to differentiate flora that can be navigated through, from the ones that must be circumvented. Using the vegetation classific… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: 8 Pages, 5 figures

  32. PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments

    Authors: James Mullen, Dinesh Manocha

    Abstract: We present PACE, a novel method for modifying motion-captured virtual agents to interact with and move throughout dense, cluttered 3D scenes. Our approach changes a given motion sequence of a virtual agent as needed to adjust to the obstacles and objects in the environment. We first take the individual frames of the motion sequence most important for modeling interactions with the scene and pair t… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Journal ref: IEEE Transactions on Visualization and Computer Graphics 29.5 (2023) 2536-2546

  33. arXiv:2303.11444  [pdf, other

    cs.CV

    Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a Single Image using Diffusion Models

    Authors: Divya Kothandaraman, Tianyi Zhou, Ming Lin, Dinesh Manocha

    Abstract: We present a novel method, Aerial Diffusion, for generating aerial views from a single ground-view image using text guidance. Aerial Diffusion leverages a pretrained text-image diffusion model for prior knowledge. We address two main challenges corresponding to domain gap between the ground-view and the aerial view and the two views being far apart in the text-image embedding manifold. Our approac… ▽ More

    Submitted 7 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Code: https://github.com/divyakraman/AerialDiffusion

    Journal ref: Siggraph Asia 2023 (Conference Proceedings, Technical Communications)

  34. arXiv:2303.10521  [pdf, other

    eess.SP

    Dynamic EM Ray Tracing for Large Urban Scenes with Multiple Receivers

    Authors: Ruichen Wang, Dinesh Manocha

    Abstract: Radio applications are increasingly being used in urban environments for cellular radio systems and safety applications that use vehicle-vehicle, and vehicle-to-infrastructure. We present a novel ray tracing-based radio propagation algorithm that can handle large urban scenes with hundreds or thousands of dynamic objects and receivers. Our approach is based on the use of coherence-based techniques… ▽ More

    Submitted 14 May, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: 7 pages, 14 figures, conference

  35. arXiv:2303.10280  [pdf, other

    cs.CV

    Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances

    Authors: Arun V. Reddy, Ketul Shah, William Paul, Rohita Mocharla, Judy Hoffman, Kapil D. Katyal, Dinesh Manocha, Celso M. de Melo, Rama Chellappa

    Abstract: Human action recognition is a challenging problem, particularly when there is high variability in factors such as subject appearance, backgrounds and viewpoint. While deep neural networks (DNNs) have been shown to perform well on action recognition tasks, they typically require large amounts of high-quality labeled data to achieve robust performance across a variety of conditions. Synthetic data h… ▽ More

    Submitted 1 August, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: ICRA 2023. The first two authors contributed equally. Dataset available at: https://github.com/reddyav1/RoCoG-v2

  36. arXiv:2303.10133  [pdf, other

    cs.RO

    DS-MPEPC: Safe and Deadlock-Avoiding Robot Navigation in Cluttered Dynamic Scenes

    Authors: Senthil Hariharan Arul, Jong Jin Park, Dinesh Manocha

    Abstract: We present an algorithm for safe robot navigation in complex dynamic environments using a variant of model predictive equilibrium point control. We use an optimization formulation to navigate robots gracefully in dynamic environments by optimizing over a trajectory cost function at each timestep. We present a novel trajectory cost formulation that significantly reduces the conservative and deadloc… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

  37. arXiv:2303.09139  [pdf, other

    cs.RO

    Real-Time Decentralized Navigation of Nonholonomic Agents Using Shifted Yielding Areas

    Authors: Liang He, Zherong Pan, Dinesh Manocha

    Abstract: We present a lightweight, decentralized algorithm for navigating multiple nonholonomic agents through challenging environments with narrow passages. Our key idea is to allow agents to yield to each other in large open areas instead of narrow passages, to increase the success rate of conventional decentralized algorithms. At pre-processing time, our method computes a medial axis for the freespace.… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  38. arXiv:2303.07622  [pdf, other

    cs.RO cs.AI cs.LG

    RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in Dynamic Environments via Language-Based Feedback

    Authors: Souradip Chakraborty, Kasun Weerakoon, Prithvi Poddar, Mohamed Elnoor, Priya Narayanan, Carl Busart, Pratap Tokekar, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures. To address this limitation, we propose a novel approach called RE-MOVE (REquest help and MOVE on) to adapt already trained policy to real-time changes in the environment without re-training vi… ▽ More

    Submitted 17 September, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  39. arXiv:2303.05668  [pdf, other

    eess.AS cs.AI

    UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation

    Authors: Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

    Abstract: In this paper, we introduce UnFuSeD, a novel approach to leverage self-supervised learning and reduce the need for large amounts of labeled data for audio classification. Unlike prior works, which directly fine-tune a self-supervised pre-trained encoder on a target dataset, we use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step. We first train… ▽ More

    Submitted 17 May, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023 SASB Workshop

  40. arXiv:2303.03480  [pdf, other

    cs.RO cs.AI cs.CL

    Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration for Zero-Shot Object Navigation

    Authors: Vishnu Sashank Dorbala, James F. Mullen Jr., Dinesh Manocha

    Abstract: We present LGX (Language-guided Exploration), a novel algorithm for Language-Driven Zero-Shot Object Goal Navigation (L-ZSON), where an embodied agent navigates to a uniquely described target object in a previously unseen environment. Our approach makes use of Large Language Models (LLMs) for this task by leveraging the LLM's commonsense reasoning capabilities for making sequential navigational de… ▽ More

    Submitted 5 November, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: 10 pages

    Journal ref: IEEE Robotics and Automation Letters 9.5 (2024) 4083-4090

  41. arXiv:2303.03387  [pdf, other

    cs.LG cs.AI cs.CL cs.SI

    CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network

    Authors: Sreyan Ghosh, Manan Suri, Purva Chiniya, Utkarsh Tyagi, Sonal Kumar, Dinesh Manocha

    Abstract: The tremendous growth of social media users interacting in online conversations has led to significant growth in hate speech, affecting people from various demographics. Most of the prior works focus on detecting explicit hate speech, which is overt and leverages hateful phrases, with very little work focusing on detecting hate speech that is implicit or denotes hatred through indirect or coded la… ▽ More

    Submitted 24 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to EMNLP 2023 Main Conference. Code: https://github.com/Sreyan88/CoSyn

  42. arXiv:2303.02575  [pdf, other

    cs.CV cs.RO

    MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition

    Authors: Ruiqi Xian, Xijun Wang, Dinesh Manocha

    Abstract: We present a novel approach for action recognition in UAV videos. Our formulation is designed to handle occlusion and viewpoint changes caused by the movement of a UAV. We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain. This enables our recognition model to learn from the key features associated with the motion.… ▽ More

    Submitted 15 November, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

  43. AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning

    Authors: Xijun Wang, Ruiqi Xian, Tianrui Guan, Celso M. de Melo, Stephen M. Nogar, Aniket Bera, Dinesh Manocha

    Abstract: We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately. This makes it easier to extract the key features and reduces the computational overhead. We also presen… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at ICRA 2023

  44. arXiv:2302.13509  [pdf, other

    cs.RO

    GeoLCR: Attention-based Geometric Loop Closure and Registration

    Authors: Jing Liang, Sanghyun Son, Ming Lin, Dinesh Manocha

    Abstract: We present a novel algorithm specially designed for loop detection and registration that utilizes Lidar-based perception. Our approach to loop detection involves voxelizing point clouds, followed by an overlap calculation to confirm whether a vehicle has completed a loop. We further enhance the current pose's accuracy via an innovative point-level registration model. The efficacy of our algorithm… ▽ More

    Submitted 16 July, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

  45. arXiv:2302.02809  [pdf, other

    eess.AS cs.CV cs.LG cs.MM cs.SD

    Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

    Authors: Anton Ratnarajah, Dinesh Manocha

    Abstract: We present an end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. We propose a novel neural-network-based binaural sound propagation method to generate acoustic effects for indoor 3D models of real environments. Any clean audio or dry audio can be convolved with the generated acoustic effects to render audio corresponding to… ▽ More

    Submitted 1 February, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Accepted to IEEE VR 2024. Project page: https://anton-jeran.github.io/Listen2Scene/

  46. arXiv:2301.12083  [pdf, other

    cs.LG math.OC stat.ML

    Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

    Authors: Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha

    Abstract: Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, mak… ▽ More

    Submitted 1 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  47. arXiv:2301.12038  [pdf, other

    cs.LG cs.AI stat.ML

    STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha

    Abstract: Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instanc… ▽ More

    Submitted 18 September, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  48. arXiv:2212.05360  [pdf, other

    eess.AS cs.AI cs.LG

    Synthetic Wave-Geometric Impulse Responses for Improved Speech Dereverberation

    Authors: Rohith Aralikatti, Zhenyu Tang, Dinesh Manocha

    Abstract: We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets. Our approach is designed to recover the reverb-free signal from a reverberant speech signal. We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation. We use the GWA dataset that con… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  49. arXiv:2211.04473  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Improved Room Impulse Response Estimation for Speech Recognition

    Authors: Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo Hoffmann, Dinesh Manocha, Paul Calamia

    Abstract: We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture tha… ▽ More

    Submitted 19 March, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted at ICASSP 2023. More results are available at https://anton-jeran.github.io/S2IR/

  50. arXiv:2211.03001  [pdf, other

    cs.HC

    VRDoc: Gaze-based Interactions for VR Reading Experience

    Authors: Geonsun Lee, Jennifer Healey, Dinesh Manocha

    Abstract: Virtual reality (VR) offers the promise of an infinite office and remote collaboration, however, existing interactions in VR do not strongly support one of the most essential tasks for most knowledge workers, reading. This paper presents VRDoc, a set of gaze-based interaction methods designed to improve the reading experience in VR. We introduce three key components: Gaze Select-and-Snap for docum… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

    Comments: 8 pages, 4 figures, ISMAR 2022

    ACM Class: D.2.2; I.3.7