Zum Hauptinhalt springen

Showing 1–50 of 99 results for author: Narayanan, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.15803  [pdf, other

    eess.AS cs.AI cs.SD

    ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation

    Authors: Tiantian Feng, Tuo Zhang, Salman Avestimehr, Shrikanth S. Narayanan

    Abstract: Multimodal Federated Learning frequently encounters challenges of client modality heterogeneity, leading to undesired performances for secondary modality in multimodal learning. It is particularly prevalent in audiovisual learning, with audio is often assumed to be the weaker modality in recognition tasks. To address this challenge, we introduce ModalityMirror to improve audio model performance by… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2407.05133  [pdf, other

    eess.SY

    Control Density Function for Robust Safety and Convergence

    Authors: Joseph Moyalan, Sriram S. K. S Narayanan, Umesh Vaidya

    Abstract: We introduce a novel approach for safe control design based on the density function. A control density function (CDF) is introduced to synthesize a safe controller for a nonlinear dynamic system. The CDF can be viewed as a dual to the control barrier function (CBF), a popular approach used for safe control design. While the safety certificate using the barrier function is based on the notion of in… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  3. arXiv:2406.08800  [pdf, other

    cs.SD cs.LG eess.AS

    Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?

    Authors: Tiantian Feng, Dimitrios Dimitriadis, Shrikanth Narayanan

    Abstract: Recent advances in foundation models have enabled audio-generative models that produce high-fidelity sounds associated with music, events, and human actions. Despite the success achieved in modern audio-generative models, the conventional approach to assessing the quality of the audio generation relies heavily on distance metrics like Frechet Audio Distance. In contrast, we aim to evaluate the qua… ▽ More

    Submitted 29 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to 2024 INTERSPEECH; corrections to ActivityNet labels

  4. arXiv:2406.08644  [pdf, other

    eess.SP cs.AI cs.SD eess.AS

    Toward Fully-End-to-End Listened Speech Decoding from EEG Signals

    Authors: Jihwan Lee, Aditya Kommineni, Tiantian Feng, Kleanthis Avramidis, Xuan Shi, Sudarsana Kadiri, Shrikanth Narayanan

    Abstract: Speech decoding from EEG signals is a challenging task, where brain activity is modeled to estimate salient characteristics of acoustic stimuli. We propose FESDE, a novel framework for Fully-End-to-end Speech Decoding from EEG signals. Our approach aims to directly reconstruct listened speech waveforms given EEG signals, where no intermediate acoustic feature processing step is required. The propo… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: accepted to Interspeech2024

  5. arXiv:2406.07890  [pdf, other

    eess.AS cs.CL cs.LG

    Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

    Authors: Anfeng Xu, Kevin Huang, Tiantian Feng, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  6. arXiv:2404.17983  [pdf, other

    cs.SD cs.CL eess.AS

    TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality

    Authors: Tiantian Feng, Xuan Shi, Rahul Gupta, Shrikanth S. Narayanan

    Abstract: Automatic Speech Understanding (ASU) aims at human-like speech interpretation, providing nuanced intent, emotion, sentiment, and content understanding from speech and language (text) content conveyed in speech. Typically, training a robust ASU model relies heavily on acquiring large-scale, high-quality speech and associated transcriptions. However, it is often challenging to collect or use speech… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  7. arXiv:2404.09215  [pdf, other

    eess.SP math.OC

    Optimum Beamforming and Grating Lobe Mitigation for Intelligent Reflecting Surfaces

    Authors: Sai Sanjay Narayanan, Uday K Khankhoje, Radha Krishna Ganti

    Abstract: Ensuring adequate wireless coverage in upcoming communication technologies such as 6G is expected to be challenging. This is because user demands of higher datarate require an increase in carrier frequencies, which in turn reduce the diffraction effects (and hence coverage) in complex multipath environments. Intelligent reflecting surfaces have been proposed as a way of restoring coverage by adapt… ▽ More

    Submitted 30 August, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: 12 pages, 16 figures

  8. arXiv:2404.02014  [pdf, other

    eess.SY

    On the Effect of Quantization on Dynamic Mode Decomposition

    Authors: Dipankar Maity, Debdipta Goswami, Sriram Narayanan

    Abstract: Dynamic Mode Decomposition (DMD) is a widely used data-driven algorithm for estimating the Koopman Operator.This paper investigates how the estimation process is affected when the data is quantized. Specifically, we examine the fundamental connection between estimates of the operator obtained from unquantized data and those from quantized data. Furthermore, using the law of large numbers, we demon… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 8 pages, 4 figures

  9. arXiv:2403.14464  [pdf, other

    eess.SY

    Synthesizing Controller for Safe Navigation using Control Density Function

    Authors: Joseph Moyalan, Sriram S. K. S Narayanan, Andrew Zheng, Umesh Vaidya

    Abstract: We consider the problem of navigating a nonlinear dynamical system from some initial set to some target set while avoiding collision with an unsafe set. We extend the concept of density function to control density function (CDF) for solving navigation problems with safety constraints. The occupancy-based interpretation of the measure associated with the density function is instrumental in imposing… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  10. arXiv:2403.14048  [pdf, ps, other

    cs.SD cs.CL eess.AS

    The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data

    Authors: Alice Baird, Rachel Manzelli, Panagiotis Tzirakis, Chris Gagne, Haoqi Li, Sadie Allen, Sander Dieleman, Brian Kulis, Shrikanth S. Narayanan, Alan Cowen

    Abstract: The NeurIPS 2023 Machine Learning for Audio Workshop brings together machine learning (ML) experts from various audio domains. There are several valuable audio-driven ML tasks, from speech emotion recognition to audio event detection, but the community is sparse compared to other ML areas, e.g., computer vision or natural language processing. A major limitation with audio is the available data; wi… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  11. arXiv:2403.03222  [pdf, other

    cs.LG cs.AI eess.SP

    Knowledge-guided EEG Representation Learning

    Authors: Aditya Kommineni, Kleanthis Avramidis, Richard Leahy, Shrikanth Narayanan

    Abstract: Self-supervised learning has produced impressive results in multimedia domains of audio, vision and speech. This paradigm is equally, if not more, relevant for the domain of biosignals, owing to the scarcity of labelled data in such scenarios. The ability to leverage large-scale unlabelled data to learn robust representations could help improve the performance of numerous inference tasks on biosig… ▽ More

    Submitted 14 February, 2024; originally announced March 2024.

    Comments: 6 Pages, 5 figures, Submitted to EMBC 2024

  12. arXiv:2402.09655  [pdf, other

    eess.SP eess.IV

    Evaluating Atypical Gaze Patterns through Vision Models: The Case of Cortical Visual Impairment

    Authors: Kleanthis Avramidis, Melinda Y. Chang, Rahul Sharma, Mark S. Borchert, Shrikanth Narayanan

    Abstract: A wide range of neurological and cognitive disorders exhibit distinct behavioral markers aside from their clinical manifestations. Cortical Visual Impairment (CVI) is a prime example of such conditions, resulting from damage to visual pathways in the brain, and adversely impacting low- and high-level visual function. The characteristics impacted by CVI are primarily described qualitatively, challe… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 5 pages, 4 figures, submitted to IEEE EMBC 2024

  13. arXiv:2402.01703  [pdf

    cs.CY cs.AI cs.LG eess.AS

    A Multi-Perspective Machine Learning Approach to Evaluate Police-Driver Interaction in Los Angeles

    Authors: Benjamin A. T. Grahama, Lauren Brown, Georgios Chochlakis, Morteza Dehghani, Raquel Delerme, Brittany Friedman, Ellie Graeden, Preni Golazizian, Rajat Hebbar, Parsa Hejabi, Aditya Kommineni, Mayagüez Salinas, Michael Sierra-Arévalo, Jackson Trager, Nicholas Weller, Shrikanth Narayanan

    Abstract: Interactions between the government officials and civilians affect public wellbeing and the state legitimacy that is necessary for the functioning of democratic society. Police officers, the most visible and contacted agents of the state, interact with the public more than 20 million times a year during traffic stops. Today, these interactions are regularly recorded by body-worn cameras (BWCs), wh… ▽ More

    Submitted 9 February, 2024; v1 submitted 24 January, 2024; originally announced February 2024.

    Comments: 13 pages

    ACM Class: I.2.0; I.2.7

  14. arXiv:2401.13784  [pdf, other

    eess.SY

    On the Predictive Capability of Dynamic Mode Decomposition for Nonlinear Periodic Systems with Focus on Orbital Mechanics

    Authors: Sriram Narayanan, Mohamed Naveed Gul Mohamed, Indranil Nayak, Suman Chakravorty, Mrinal Kumar

    Abstract: This paper discusses the predictive capability of Dynamic Mode Decomposition (DMD) in the context of orbital mechanics. The focus is specifically on the Hankel variant of DMD which uses a stacked set of time-delayed observations for system identification and subsequent prediction. A theory on the minimum number of time delays required for accurate reconstruction of periodic trajectories of nonline… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  15. arXiv:2312.06979  [pdf, ps, other

    eess.IV cs.CV cs.LG

    On the notion of Hallucinations from the lens of Bias and Validity in Synthetic CXR Images

    Authors: Gauri Bhardwaj, Yuvaraj Govindarajulu, Sundaraparipurnan Narayanan, Pavan Kulkarni, Manojkumar Parmar

    Abstract: Medical imaging has revolutionized disease diagnosis, yet the potential is hampered by limited access to diverse and privacy-conscious datasets. Open-source medical datasets, while valuable, suffer from data quality and clinical information disparities. Generative models, such as diffusion models, aim to mitigate these challenges. At Stanford, researchers explored the utility of a fine-tuned Stabl… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted at 37th Conference on Neural Information Processing Systems (NeurIPS 2023) - "Medical Imaging Meets NeurIPS" Workshop

  16. arXiv:2312.02541  [pdf, other

    eess.IV cs.CV

    Explainable Severity ranking via pairwise n-hidden comparison: a case study of glaucoma

    Authors: Hong Nguyen, Cuong V. Nguyen, Shrikanth Narayanan, Benjamin Y. Xu, Michael Pazzani

    Abstract: Primary open-angle glaucoma (POAG) is a chronic and progressive optic nerve condition that results in an acquired loss of optic nerve fibers and potential blindness. The gradual onset of glaucoma results in patients progressively losing their vision without being consciously aware of the changes. To diagnose POAG and determine its severity, patients must undergo a comprehensive dilated eye examina… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 4 pages

  17. arXiv:2310.20292  [pdf, other

    eess.IV cs.CV

    IARS SegNet: Interpretable Attention Residual Skip connection SegNet for melanoma segmentation

    Authors: Shankara Narayanan V, Sikha OK, Raul Benitez

    Abstract: Skin lesion segmentation plays a crucial role in the computer-aided diagnosis of melanoma. Deep Learning models have shown promise in accurately segmenting skin lesions, but their widespread adoption in real-life clinical settings is hindered by their inherent black-box nature. In domains as critical as healthcare, interpretability is not merely a feature but a fundamental requirement for model ad… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Submitted to the journal: Computers in Biology and Medicine

  18. arXiv:2310.01867  [pdf, other

    eess.AS cs.SD

    Audio-visual child-adult speaker classification in dyadic interactions

    Authors: Anfeng Xu, Kevin Huang, Tiantian Feng, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Interactions involving children span a wide range of important domains from learning to clinical diagnostic and therapeutic contexts. Automated analyses of such interactions are motivated by the need to seek accurate insights and offer scale and robustness across diverse and wide-ranging conditions. Identifying the speech segments belonging to the child is a critical step in such modeling. Convent… ▽ More

    Submitted 9 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: In review for ICASSP 2024, 5 pages

  19. arXiv:2309.15292  [pdf, other

    cs.LG eess.SP

    Scaling Representation Learning from Ubiquitous ECG with State-Space Models

    Authors: Kleanthis Avramidis, Dominika Kunc, Bartosz Perz, Kranti Adsul, Tiantian Feng, Przemysław Kazienko, Stanisław Saganowski, Shrikanth Narayanan

    Abstract: Ubiquitous sensing from wearable devices in the wild holds promise for enhancing human well-being, from diagnosing clinical conditions and measuring stress to building adaptive health promoting scaffolds. But the large volumes of data therein across heterogeneous contexts pose challenges for conventional supervised learning approaches. Representation Learning from biological signals is an emerging… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Pre-print, currently under review

  20. arXiv:2309.08108  [pdf, other

    cs.SD eess.AS

    Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting

    Authors: Tiantian Feng, Shrikanth Narayanan

    Abstract: Significant advances are being made in speech emotion recognition (SER) using deep learning models. Nonetheless, training SER systems remains challenging, requiring both time and costly resources. Like many other machine learning tasks, acquiring datasets for SER requires substantial data annotation efforts, including transcription and labeling. These annotation processes present challenges when a… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Under review

  21. arXiv:2308.12610  [pdf, other

    cs.MM cs.SD eess.AS

    Emotion-Aligned Contrastive Learning Between Images and Music

    Authors: Shanti Stewart, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan

    Abstract: Traditional music search engines rely on retrieval methods that match natural language queries with music metadata. There have been increasing efforts to expand retrieval methods to consider the audio characteristics of music itself, using queries of various modalities including text, video, and speech. While most approaches aim to match general music semantics to the input queries, only a few foc… ▽ More

    Submitted 20 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 4 pages + 1 reference page, 1 figure, 3 tables. Under review for publication

  22. arXiv:2307.16398  [pdf, other

    eess.AS

    Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism

    Authors: Rimita Lahiri, Tiantian Feng, Rajat Hebbar, Catherine Lord, So Hyun Kim, Shrikanth Narayanan

    Abstract: We address the problem of detecting who spoke when in child-inclusive spoken interactions i.e., automatic child-adult speaker classification. Interactions involving children are richly heterogeneous due to developmental differences. The presence of neurodiversity e.g., due to Autism, contributes additional variability. We investigate the impact of additional pre-training with more unlabelled child… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  23. arXiv:2307.04445  [pdf, other

    cs.LG eess.SP

    Learning Behavioral Representations of Routines From Large-scale Unlabeled Wearable Time-series Data Streams using Hawkes Point Process

    Authors: Tiantian Feng, Brandon M Booth, Shrikanth Narayanan

    Abstract: Continuously-worn wearable sensors enable researchers to collect copious amounts of rich bio-behavioral time series recordings of real-life activities of daily living, offering unprecedented opportunities to infer novel human behavior patterns during daily routines. Existing approaches to routine discovery through bio-behavioral data rely either on pre-defined notions of activities or use addition… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: 2023 9th ACM SIGKDD International Workshop on Mining and Learning From Time Series (MiLeTS 2023)

  24. arXiv:2306.07791  [pdf, other

    cs.SD eess.AS

    Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

    Authors: Tiantian Feng, Digbalay Bose, Xuan Shi, Shrikanth Narayanan

    Abstract: Automatic Speech Understanding (ASU) leverages the power of deep learning models for accurate interpretation of human speech, leading to a wide range of speech applications that enrich the human experience. However, training a robust ASU model requires the curation of a large number of speech samples, creating risks for privacy breaches. In this work, we investigate using foundation models to assi… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  25. PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

    Authors: Tiantian Feng, Shrikanth Narayanan

    Abstract: Many recent studies have focused on fine-tuning pre-trained models for speech emotion recognition (SER), resulting in promising performance compared to traditional methods that rely largely on low-level, knowledge-inspired acoustic features. These pre-trained speech models learn general-purpose speech representations using self-supervised or weakly-supervised learning objectives from large-scale d… ▽ More

    Submitted 14 February, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: This work was accepted to the 11th International Conference on Affective Computing and Intelligent Interaction (ACII), 2023

  26. arXiv:2305.14117  [pdf, other

    eess.AS cs.LG

    Understanding Spoken Language Development of Children with ASD Using Pre-trained Speech Embeddings

    Authors: Anfeng Xu, Rajat Hebbar, Rimita Lahiri, Tiantian Feng, Lindsay Butler, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Speech processing techniques are useful for analyzing speech and language development in children with Autism Spectrum Disorder (ASD), who are often varied and delayed in acquiring these skills. Early identification and intervention are crucial, but traditional assessment methodologies such as caregiver reports are not adequate for the requisite behavioral phenotyping. Natural Language Sample (NLS… ▽ More

    Submitted 31 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023, 5 pages

  27. arXiv:2305.11229  [pdf, other

    cs.SD eess.AS

    TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition

    Authors: Tiantian Feng, Rajat Hebbar, Shrikanth Narayanan

    Abstract: Recent studies have explored the use of pre-trained embeddings for speech emotion recognition (SER), achieving comparable performance to conventional methods that rely on low-level knowledge-inspired acoustic features. These embeddings are often generated from models trained on large-scale speech datasets using self-supervised or weakly-supervised learning objectives. Despite the significant advan… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  28. arXiv:2305.07850  [pdf

    eess.IV cs.CV

    Squeeze Excitation Embedded Attention UNet for Brain Tumor Segmentation

    Authors: Gaurav Prasanna, John Rohit Ernest, Lalitha G, Sathiya Narayanan

    Abstract: Deep Learning based techniques have gained significance over the past few years in the field of medicine. They are used in various applications such as classifying medical images, segmentation and identification. The existing architectures such as UNet, Attention UNet and Attention Residual UNet are already currently existing methods for the same application of brain tumor segmentation, but none o… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

  29. arXiv:2305.02938  [pdf, other

    cs.RO eess.SY

    Off-Road Navigation of Legged Robots Using Linear Transfer Operators

    Authors: Joseph Moyalan, Andrew Zheng, Sriram S. K. S Narayanan, Umesh Vaidya

    Abstract: This paper presents the implementation of off-road navigation on legged robots using convex optimization through linear transfer operators. Given a traversability measure that captures the off-road environment, we lift the navigation problem into the density space using the Perron-Frobenius (P-F) operator. This allows the problem formulation to be represented as a convex optimization. Due to the o… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  30. arXiv:2304.08614  [pdf, ps, other

    eess.SP cs.LG

    Signal Processing Grand Challenge 2023 -- e-Prevention: Sleep Behavior as an Indicator of Relapses in Psychotic Patients

    Authors: Kleanthis Avramidis, Kranti Adsul, Digbalay Bose, Shrikanth Narayanan

    Abstract: This paper presents the approach and results of USC SAIL's submission to the Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting relapses in psychotic patients. Relapse prediction has proven to be challenging, primarily due to the heterogeneity of symptoms and responses to treatment between individuals. We address these challenges by investigating the use of sleep behavior… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: 2 pages, 1 table, ICASSP 2023, Grand Challenges Track

  31. Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAP

    Authors: Nikolaos Antoniou, Athanasios Katsamanis, Theodoros Giannakopoulos, Shrikanth Narayanan

    Abstract: There is an imminent need for guidelines and standard test sets to allow direct and fair comparisons of speech emotion recognition (SER). While resources, such as the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database, have emerged as widely-adopted reference corpora for researchers to develop and test models for SER, published work reveals a wide range of assumptions and variety in it… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted at ICASSP 2023

  32. arXiv:2302.07315  [pdf, other

    eess.AS cs.LG cs.SD

    A dataset for Audio-Visual Sound Event Detection in Movies

    Authors: Rajat Hebbar, Digbalay Bose, Krishna Somandepalli, Veena Vijai, Shrikanth Narayanan

    Abstract: Audio event detection is a widely studied audio processing task, with applications ranging from self-driving cars to healthcare. In-the-wild datasets such as Audioset have propelled research in this field. However, many efforts typically involve manual annotation and verification, which is expensive to perform at scale. Movies depict various real-life and fictional scenarios which makes them a ric… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

  33. arXiv:2212.09090  [pdf, other

    cs.SD cs.MM eess.AS

    Exploring Workplace Behaviors through Speaking Patterns using Large-scale Multimodal Wearable Recordings: A Study of Healthcare Providers

    Authors: Tiantian Feng, Shrikanth Narayanan

    Abstract: Interpersonal spoken communication is central to human interaction and the exchange of information. Such interactive processes involve not only speech and spoken language but also non-verbal cues such as hand gestures, facial expressions, and nonverbal vocalization, that are used to express feelings and provide feedback. These multimodal communication signals carry a variety of information about t… ▽ More

    Submitted 18 December, 2022; originally announced December 2022.

  34. arXiv:2212.09006  [pdf, other

    cs.SD cs.LG eess.AS

    A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness

    Authors: Tiantian Feng, Rajat Hebbar, Nicholas Mehlman, Xuan Shi, Aditya Kommineni, and Shrikanth Narayanan

    Abstract: Speech-centric machine learning systems have revolutionized many leading domains ranging from transportation and healthcare to education and defense, profoundly changing how people live, work, and interact with each other. However, recent studies have demonstrated that many speech-centric ML systems may need to be considered more trustworthy for broader deployment. Specifically, concerns over priv… ▽ More

    Submitted 16 April, 2023; v1 submitted 17 December, 2022; originally announced December 2022.

    Journal ref: APSIPA Transactions on Signal and Information Processing, vol. 12, no. 3, 2023

  35. arXiv:2211.13868  [pdf, other

    cs.SD eess.AS

    Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?

    Authors: Xuan Shi, Erica Cooper, Xin Wang, Junichi Yamagishi, Shrikanth Narayanan

    Abstract: With the similarity between music and speech synthesis from symbolic input and the rapid development of text-to-speech (TTS) techniques, it is worthwhile to explore ways to improve the MIDI-to-audio performance by borrowing from TTS techniques. In this study, we analyze the shortcomings of a TTS-based MIDI-to-audio system and improve it in terms of feature computation, model selection, and trainin… ▽ More

    Submitted 20 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP 2023

  36. arXiv:2211.03279  [pdf, other

    eess.AS cs.SD

    A Context-Aware Computational Approach for Measuring Vocal Entrainment in Dyadic Conversations

    Authors: Rimita Lahiri, Md Nasir, Catherine Lord, So Hyun Kim, Shrikanth Narayanan

    Abstract: Vocal entrainment is a social adaptation mechanism in human interaction, knowledge of which can offer useful insights to an individual's cognitive-behavioral characteristics. We propose a context-aware approach for measuring vocal entrainment in dyadic conversations. We use conformers(a combination of convolutional network and transformer) for capturing both short-term and long-term conversational… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  37. arXiv:2210.15828  [pdf, other

    cs.SD cs.MM eess.AS

    On the Role of Visual Context in Enriching Music Representations

    Authors: Kleanthis Avramidis, Shanti Stewart, Shrikanth Narayanan

    Abstract: Human perception and experience of music is highly context-dependent. Contextual variability contributes to differences in how we interpret and interact with music, challenging the design of robust models for information retrieval. Incorporating multimodal context from diverse sources provides a promising approach toward modeling this variability. Music presented in media such as movies and music… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures, 1 table

  38. arXiv:2210.15826  [pdf, other

    eess.SP cs.HC

    Multimodal Estimation of Change Points of Physiological Arousal in Drivers

    Authors: Kleanthis Avramidis, Tiantian Feng, Digbalay Bose, Shrikanth Narayanan

    Abstract: Detecting unsafe driving states, such as stress, drowsiness, and fatigue, is an important component of ensuring driving safety and an essential prerequisite for automatic intervention systems in vehicles. These concerning conditions are primarily connected to the driver's low or high arousal levels. In this study, we describe a framework for processing multimodal physiological time-series from wea… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 tables, 4 figures

  39. arXiv:2210.15707  [pdf, other

    cs.SD cs.DC eess.AS

    FedAudio: A Federated Learning Benchmark for Audio Tasks

    Authors: Tuo Zhang, Tiantian Feng, Samiul Alam, Sunwoo Lee, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr

    Abstract: Federated learning (FL) has gained substantial attention in recent years due to the data privacy concerns related to the pervasiveness of consumer devices that continuously collect data from users. While a number of FL benchmarks have been developed to facilitate FL research, none of them include audio data and audio-related tasks. In this paper, we fill this critical gap by introducing a new FL b… ▽ More

    Submitted 8 February, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

  40. arXiv:2209.11896  [pdf, other

    eess.IV cs.CV eess.AS

    Unsupervised active speaker detection in media content using cross-modal information

    Authors: Rahul Sharma, Shrikanth Narayanan

    Abstract: We present a cross-modal unsupervised framework for active speaker detection in media content such as TV shows and movies. Machine learning advances have enabled impressive performance in identifying individuals from speech and facial images. We leverage speaker identity information from speech and faces, and formulate active speaker detection as a speech-face assignment task such that the active… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: Under review at IEEE Transactions on Image Processing

  41. arXiv:2209.05273  [pdf, other

    eess.AS

    The 2022 Far-field Speaker Verification Challenge: Exploring domain mismatch and semi-supervised learning under the far-field scenario

    Authors: Xiaoyi Qin, Ming Li, Hui Bu, Shrikanth Narayanan, Haizhou Li

    Abstract: FFSVC2022 is the second challenge of far-field speaker verification. FFSVC2022 provides the fully-supervised far-field speaker verification to further explore the far-field scenario and proposes semi-supervised far-field speaker verification. In contrast to FFSVC2020, FFSVC2022 focus on the single-channel scenario. In addition, a supplementary set for the FFSVC2020 dataset is released this year. T… ▽ More

    Submitted 15 September, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

  42. arXiv:2207.04565  [pdf, other

    eess.IV cs.LG

    Automating Detection of Papilledema in Pediatric Fundus Images with Explainable Machine Learning

    Authors: Kleanthis Avramidis, Mohammad Rostami, Melinda Chang, Shrikanth Narayanan

    Abstract: Papilledema is an ophthalmic neurologic disorder in which increased intracranial pressure leads to swelling of the optic nerves. Undiagnosed papilledema in children may lead to blindness and may be a sign of life-threatening conditions, such as brain tumors. Robust and accurate clinical diagnosis of this syndrome can be facilitated by automated analysis of fundus images using deep learning, especi… ▽ More

    Submitted 10 July, 2022; originally announced July 2022.

    Comments: 5 pages, 4 figures, 2 tables, 2022 IEEE International Conference on Image Processing (ICIP)

  43. arXiv:2207.00190  [pdf

    eess.SP

    Range of Motion Sensors for Monitoring Recovery of Total Knee Arthroplasty

    Authors: Minh Cao, Brett Bailey, Wenhao Zhang, Solana Fernandez, Aaron Han, Smiti Narayanan, Shrineel Patel, Steven Saletta, Alexandra Stavrakis, Stephen Speicher, Stephanie Seidlits, Arash Naeim, Ramin Ramezani

    Abstract: A low-cost, accurate device to measure and record knee range of motion (ROM) is of the essential need to improve confidence in at-home rehabilitation. It is to reduce hospital stay duration and overall medical cost after Total Knee Arthroplasty (TKA) procedures. The shift in Medicare funding from pay-as-you-go to the Bundled Payments for Care Improvement (BPCI) has created a push towards at-home c… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

    Comments: 8 pages, 16 figures, 1 table, submitted to BSN conference 2022

  44. User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning

    Authors: Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan

    Abstract: Many existing privacy-enhanced speech emotion recognition (SER) frameworks focus on perturbing the original speech data through adversarial training within a centralized machine learning setup. However, this privacy protection scheme can fail since the adversary can still access the perturbed data. In recent years, distributed learning algorithms, especially federated learning (FL), have gained po… ▽ More

    Submitted 16 May, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Journal ref: Proc. Interspeech 2022

  45. arXiv:2204.00657  [pdf, other

    eess.AS cs.SD

    Multimodal Clustering with Role Induced Constraints for Speaker Diarization

    Authors: Nikolaos Flemotomos, Shrikanth Narayanan

    Abstract: Speaker clustering is an essential step in conventional speaker diarization systems and is typically addressed as an audio-only speech processing task. The language used by the participants in a conversation, however, carries additional information that can help improve the clustering performance. This is especially true in conversational interactions, such as business meetings, interviews, and le… ▽ More

    Submitted 11 July, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: To appear at Interspeech 2022

  46. arXiv:2203.15961  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Using Active Speaker Faces for Diarization in TV shows

    Authors: Rahul Sharma, Shrikanth Narayanan

    Abstract: Speaker diarization is one of the critical components of computational media intelligence as it enables a character-level analysis of story portrayals and media content understanding. Automated audio-based speaker diarization of entertainment media poses challenges due to the diverse acoustic conditions present in media content, be it background music, overlapping speakers, or sound effects. At th… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  47. arXiv:2203.15283  [pdf, other

    eess.AS cs.LG

    Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems

    Authors: Nicholas Mehlman, Anirudh Sreeram, Raghuveer Peri, Shrikanth Narayanan

    Abstract: A variety of recent works have looked into defenses for deep neural networks against adversarial attacks particularly within the image processing domain. Speech processing applications such as automatic speech recognition (ASR) are increasingly relying on deep learning models, and so are also prone to adversarial attacks. However, many of the defenses explored for ASR simply adapt the image-domain… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: This paper is 5 pages long and was submitted to Interspeech 2022

  48. arXiv:2203.09122  [pdf, other

    eess.AS

    To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition

    Authors: Raghuveer Peri, Krishna Somandepalli, Shrikanth Narayanan

    Abstract: Speaker recognition is increasingly used in several everyday applications including smart speakers, customer care centers and other speech-driven analytics. It is crucial to accurately evaluate and mitigate biases present in machine learning (ML) based speech technologies, such as speaker recognition, to ensure their inclusive adoption. ML fairness studies with respect to various demographic facto… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: Preprint submitted to Computer Speech and Language (Elsevier)

  49. arXiv:2203.08810  [pdf, ps, other

    eess.AS cs.CR cs.LG cs.SD

    Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling

    Authors: Tiantian Feng, Shrikanth Narayanan

    Abstract: Speech Emotion Recognition (SER) application is frequently associated with privacy concerns as it often acquires and transmits speech data at the client-side to remote cloud platforms for further processing. These speech data can reveal not only speech content and affective information but the speaker's identity, demographic traits, and health status. Federated learning (FL) is a distributed machi… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: This paper was submitted to Insterspeech 2022 for review

    Journal ref: Proc. Interspeech 2022

  50. A 120dB Programmable-Range On-Chip Pulse Generator for Characterizing Ferroelectric Devices

    Authors: Shyam Narayanan, Erika Covi, Viktor Havel, Charlotte Frenkel, Suzanne Lancaster, Quang Duong, Stefan Slesazeck, Thomas Mikolajick, Melika Payvand, Giacomo Indiveri

    Abstract: Novel non-volatile memory devices based on ferroelectric thin films represent a promising emerging technology that is ideally suited for neuromorphic applications. The physical switching mechanism in such films is the nucleation and growth of ferroelectric domains. Since this has a strong dependence on both pulse width and voltage amplitude, it is important to use precise pulsing schemes for a tho… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

    Journal ref: 2022 IEEE International Symposium on Circuits and Systems (ISCAS)