Zum Hauptinhalt springen

Showing 1–21 of 21 results for author: Arora, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.02302  [pdf, other

    eess.AS cs.CL

    Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

    Authors: Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky

    Abstract: While massively multilingual speech models like wav2vec 2.0 XLSR-128 can be directly fine-tuned for automatic speech recognition (ASR), downstream performance can still be relatively poor on languages that are under-represented in the pre-training data. Continued pre-training on 70-200 hours of untranscribed speech in these languages can help -- but what about languages without that much recorded… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: Accepted for SIGTYP2024

  2. arXiv:2401.13851  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

    Authors: Akshit Arora, Rohan Badlani, Sungwon Kim, Rafael Valle, Bryan Catanzaro

    Abstract: In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by training additionally on 5 minutes of target speaker data. In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets.… ▽ More

    Submitted 29 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Presentation accepted at ICASSP 2024

  3. arXiv:2304.10618  [pdf, other

    cs.AR eess.SP

    ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks

    Authors: Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araujo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. Franca, Mauricio Breternitz Jr., Lizy K. John

    Abstract: The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more ef… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: 14 pages, 14 figures Portions of this article draw heavily from arXiv:2203.01479, most notably sections 5E and 5F.2

  4. arXiv:2303.07578  [pdf, ps, other

    cs.SD cs.LG eess.AS

    VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

    Authors: Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro

    Abstract: We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Cha… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Presentation accepted at ICASSP 2023

  5. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  6. arXiv:2205.01649  [pdf, other

    eess.IV cs.CV

    Learning Enriched Features for Fast Image Restoration and Enhancement

    Authors: Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

    Abstract: Given a degraded input image, image restoration aims to recover the missing high-quality image content. Numerous applications demand effective image restoration, e.g., computational photography, surveillance, autonomous vehicles, and remote sensing. Significant advances in image restoration have been made in recent years, dominated by convolutional neural networks (CNNs). The widely-used CNN-based… ▽ More

    Submitted 19 April, 2022; originally announced May 2022.

    Comments: This article supersedes arXiv:2003.06792. Accepted for publication in TPAMI

  7. arXiv:2203.10425  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    A Study on Robustness to Perturbations for Representations of Environmental Sound

    Authors: Sangeeta Srivastava, Ho-Hsiang Wu, Joao Rulff, Magdalena Fuentes, Mark Cartwright, Claudio Silva, Anish Arora, Juan Pablo Bello

    Abstract: Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. The… ▽ More

    Submitted 6 July, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

    Comments: Accepted in EUSIPCO 2022

  8. arXiv:2203.06220  [pdf, other

    cs.SD cs.NI eess.AS

    Infrastructure-free, Deep Learned Urban Noise Monitoring at $\sim$100mW

    Authors: Jihoon Yun, Sangeeta Srivastava, Dhrubojyoti Roy, Nathan Stohs, Charlie Mydlarz, Mahin Salman, Bea Steers, Juan Pablo Bello, Anish Arora

    Abstract: The Sounds of New York City (SONYC) wireless sensor network (WSN) has been fielded in Manhattan and Brooklyn over the past five years, as part of a larger human-in-the-loop cyber-physical control system for monitoring, analyzing, and mitigating urban noise pollution. We describe the evolution of the 2-tier SONYC WSN from an acoustic data collection fabric into a 3-tier in situ noise complaint moni… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Accepted in ICCPS 2022

  9. Learning Sparse Graphs via Majorization-Minimization for Smooth Node Signals

    Authors: Ghania Fatima, Aakash Arora, Prabhu Babu, Petre Stoica

    Abstract: In this letter, we propose an algorithm for learning a sparse weighted graph by estimating its adjacency matrix under the assumption that the observed signals vary smoothly over the nodes of the graph. The proposed algorithm is based on the principle of majorization-minimization (MM), wherein we first obtain a tight surrogate function for the graph learning objective and then solve the resultant s… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

  10. arXiv:2112.11716  [pdf, other

    cs.CV eess.IV

    Comparing radiologists' gaze and saliency maps generated by interpretability methods for chest x-rays

    Authors: Ricardo Bigolin Lanfredi, Ambuj Arora, Trafton Drew, Joyce D. Schroeder, Tolga Tasdizen

    Abstract: The interpretability of medical image analysis models is considered a key research field. We use a dataset of eye-tracking data from five radiologists to compare the outputs of interpretability methods and the heatmaps representing where radiologists looked. We conduct a class-independent analysis of the saliency maps generated by two methods selected from the literature: Grad-CAM and attention ma… ▽ More

    Submitted 19 April, 2023; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: This paper was presented as an Extended Abstract at the Gaze Meets ML 2022 Workshop, a NeurIPS 2022 workshop

  11. arXiv:2110.08600  [pdf, other

    eess.SP math.OC stat.CO

    PDMM: A novel Primal-Dual Majorization-Minimization algorithm for Poisson Phase-Retrieval problem

    Authors: Ghania Fatima, Zongyu Li, Aakash Arora, Prabhu Babu

    Abstract: In this paper, we introduce a novel iterative algorithm for the problem of phase-retrieval where the measurements consist of only the magnitude of linear function of the unknown signal, and the noise in the measurements follow Poisson distribution. The proposed algorithm is based on the principle of majorization-minimization (MM); however, the application of MM here is very novel and distinct from… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

  12. arXiv:2109.14900  [pdf, other

    cs.LG eess.AS

    Impact of Channel Variation on One-Class Learning for Spoof Detection

    Authors: Rohit Arora, Anmol Arora, Rohit Singh Rathore

    Abstract: Margin-based losses, especially one-class classification loss, have improved the generalization capabilities of countermeasure systems (CMs), but their reliability is not tested with spoofing attacks degraded with channel variation. Our experiments aim to tackle this in two ways: first, by investigating the impact of various codec simulations and their corresponding parameters, namely bit-rate, di… ▽ More

    Submitted 27 June, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

  13. arXiv:2107.09178  [pdf, other

    cs.AR eess.SP

    Compute RAMs: Adaptable Compute and Storage Blocks for DL-Optimized FPGAs

    Authors: Aman Arora, Bagus Hanindhito, Lizy K. John

    Abstract: The configurable building blocks of current FPGAs -- Logic blocks (LBs), Digital Signal Processing (DSP) slices, and Block RAMs (BRAMs) -- make them efficient hardware accelerators for the rapid-changing world of Deep Learning (DL). Communication between these blocks happens through an interconnect fabric consisting of switching elements spread throughout the FPGA. In this paper, a new block, Comp… ▽ More

    Submitted 30 September, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: 8 pages, IEEE Signal Processing Society's ASILOMAR Conference on Signals, Systems and Computers

  14. arXiv:2006.11789  [pdf, other

    eess.SY math.OC

    Impact of packet dropouts on the performance of optimal controllers and observers

    Authors: Amanpreet Singh Arora, Sanand Dilip

    Abstract: We investigate the impact of packet dropouts due to non-idealities in communication networks on the performance of optimally derived controllers and observers in a minimax sense. These packet dropouts are modeled by discrete constrained switching signals via directed graphs. We consider time optimal control and estimation, minimum energy and fuel optimization and LQR problems for systems subject t… ▽ More

    Submitted 21 June, 2020; originally announced June 2020.

  15. arXiv:2006.07898  [pdf, other

    eess.AS cs.SD

    The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge

    Authors: Ashish Arora, Desh Raj, Aswin Shanmugam Subramanian, Ke Li, Bar Ben-Yair, Matthew Maciejewski, Piotr Żelasko, Paola García, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CHiME-6 challenge for distant multi-microphone conversational speech diarization and recognition in everyday home environments. We explore multi-array processing techniques at each stage of the pipeline, such as multi-array guided source separation (GSS) for enhancement and acoustic model training data, posterior fusion for spee… ▽ More

    Submitted 14 June, 2020; originally announced June 2020.

    Comments: Presented at the CHiME-6 workshop (colocated with ICASSP 2020)

  16. arXiv:2004.09249  [pdf, other

    cs.SD cs.CL eess.AS

    CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

    Authors: Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

    Abstract: Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous C… ▽ More

    Submitted 2 May, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

  17. arXiv:2003.07761  [pdf, other

    eess.IV cs.CV

    CycleISP: Real Image Restoration via Improved Data Synthesis

    Authors: Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

    Abstract: The availability of large-scale datasets has helped unleash the true potential of deep convolutional neural networks (CNNs). However, for the single-image denoising problem, capturing a real dataset is an unacceptably expensive and cumbersome procedure. Consequently, image denoising algorithms are mostly developed and evaluated on synthetic data that is usually generated with a widespread assumpti… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: CVPR 2020 (Oral)

  18. arXiv:1912.00501  [pdf, other

    cs.CV cs.CL cs.LG eess.IV

    Interpreting Context of Images using Scene Graphs

    Authors: Himangi Mittal, Ajith Abraham, Anuja Arora

    Abstract: Understanding a visual scene incorporates objects, relationships, and context. Traditional methods working on an image mostly focus on object detection and fail to capture the relationship between the objects. Relationships can give rich semantic information about the objects in a scene. The context can be conducive to comprehending an image since it will help us to perceive the relation between t… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

    Comments: To appear in International Conference on Big Data Analytics (BDA2019) (Accepted)

  19. arXiv:1909.03082  [pdf, other

    eess.SP cs.LG

    One Size Does Not Fit All: Multi-Scale, Cascaded RNNs for Radar Classification

    Authors: Dhrubojyoti Roy, Sangeeta Srivastava, Aditya Kusupati, Pranshu Jain, Manik Varma, Anish Arora

    Abstract: Edge sensing with micro-power pulse-Doppler radars is an emergent domain in monitoring and surveillance with several smart city applications. Existing solutions for the clutter versus multi-source radar classification task are limited in terms of either accuracy or efficiency, and in some cases, struggle with a trade-off between false alarms and recall of sources. We find that this problem can be… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: Conditionally accepted to ACM BuildSys 2019

  20. arXiv:1811.00936  [pdf, other

    cs.SD eess.AS

    Acoustic Features Fusion using Attentive Multi-channel Deep Architecture

    Authors: Gaurav Bhatt, Akshita Gupta, Aditya Arora, Balasubramanian Raman

    Abstract: In this paper, we present a novel deep fusion architecture for audio classification tasks. The multi-channel model presented is formed using deep convolution layers where different acoustic features are passed through each channel. To enable dissemination of information across the channels, we introduce attention feature maps that aid in the alignment of frames. The output of each channel is merge… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

    Comments: Accepted in CHiME'18 (Interspeech Workshop)

  21. arXiv:1805.00889  [pdf, other

    cs.SD cs.CY cs.HC eess.AS

    SONYC: A System for the Monitoring, Analysis and Mitigation of Urban Noise Pollution

    Authors: Juan Pablo Bello, Claudio Silva, Oded Nov, R. Luke DuBois, Anish Arora, Justin Salamon, Charles Mydlarz, Harish Doraiswamy

    Abstract: We present the Sounds of New York City (SONYC) project, a smart cities initiative focused on developing a cyber-physical system for the monitoring, analysis and mitigation of urban noise pollution. Noise pollution is one of the topmost quality of life issues for urban residents in the U.S. with proven effects on health, education, the economy, and the environment. Yet, most cities lack the resourc… ▽ More

    Submitted 18 May, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

    Comments: Accepted May 2018, Communications of the ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record will be published in Communications of the ACM