Zum Hauptinhalt springen

Showing 1–42 of 42 results for author: Feng, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.15803  [pdf, other

    eess.AS cs.AI cs.SD

    ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation

    Authors: Tiantian Feng, Tuo Zhang, Salman Avestimehr, Shrikanth S. Narayanan

    Abstract: Multimodal Federated Learning frequently encounters challenges of client modality heterogeneity, leading to undesired performances for secondary modality in multimodal learning. It is particularly prevalent in audiovisual learning, with audio is often assumed to be the weaker modality in recognition tasks. To address this challenge, we introduce ModalityMirror to improve audio model performance by… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2407.13227  [pdf, other

    eess.SY

    Solving the Model Unavailable MARE using Q-Learning Algorithm

    Authors: Fei Yan, Jie Gao, Tao Feng, Jianxing Liu

    Abstract: In this paper, the discrete-time modified algebraic Riccati equation (MARE) is solved when the system model is completely unavailable. To achieve this, firstly a brand new iterative method based on the standard discrete-time algebraic Riccati equation (DARE) and its input weighting matrix is proposed to solve the MARE. For the single-input case, the iteration can be initialized by an arbitrary pos… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2406.12816  [pdf, other

    cs.LG cs.CV eess.IV

    Neural Approximate Mirror Maps for Constrained Diffusion Models

    Authors: Berthy T. Feng, Ricardo Baptista, Katherine L. Bouman

    Abstract: Diffusion models excel at creating visually-convincing images, but they often struggle to meet subtle constraints inherent in the training data. Such constraints could be physics-based (e.g., satisfying a PDE), geometric (e.g., respecting symmetry), or semantic (e.g., including a particular number of objects). When the training data all satisfy a certain constraint, enforcing this constraint on a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.08800  [pdf, other

    cs.SD cs.LG eess.AS

    Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?

    Authors: Tiantian Feng, Dimitrios Dimitriadis, Shrikanth Narayanan

    Abstract: Recent advances in foundation models have enabled audio-generative models that produce high-fidelity sounds associated with music, events, and human actions. Despite the success achieved in modern audio-generative models, the conventional approach to assessing the quality of the audio generation relies heavily on distance metrics like Frechet Audio Distance. In contrast, we aim to evaluate the qua… ▽ More

    Submitted 29 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to 2024 INTERSPEECH; corrections to ActivityNet labels

  5. arXiv:2406.08644  [pdf, other

    eess.SP cs.AI cs.SD eess.AS

    Toward Fully-End-to-End Listened Speech Decoding from EEG Signals

    Authors: Jihwan Lee, Aditya Kommineni, Tiantian Feng, Kleanthis Avramidis, Xuan Shi, Sudarsana Kadiri, Shrikanth Narayanan

    Abstract: Speech decoding from EEG signals is a challenging task, where brain activity is modeled to estimate salient characteristics of acoustic stimuli. We propose FESDE, a novel framework for Fully-End-to-end Speech Decoding from EEG signals. Our approach aims to directly reconstruct listened speech waveforms given EEG signals, where no intermediate acoustic feature processing step is required. The propo… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: accepted to Interspeech2024

  6. arXiv:2406.07890  [pdf, other

    eess.AS cs.CL cs.LG

    Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

    Authors: Anfeng Xu, Kevin Huang, Tiantian Feng, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  7. arXiv:2406.02785  [pdf, other

    astro-ph.IM cs.LG eess.IV

    Event-horizon-scale Imaging of M87* under Different Assumptions via Deep Generative Image Priors

    Authors: Berthy T. Feng, Katherine L. Bouman, William T. Freeman

    Abstract: Reconstructing images from the Event Horizon Telescope (EHT) observations of M87*, the supermassive black hole at the center of the galaxy M87, depends on a prior to impose desired image statistics. However, given the impossibility of directly observing black holes, there is no clear choice for a prior. We present a framework for flexibly designing a range of priors, each bringing different biases… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  8. arXiv:2404.17983  [pdf, other

    cs.SD cs.CL eess.AS

    TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality

    Authors: Tiantian Feng, Xuan Shi, Rahul Gupta, Shrikanth S. Narayanan

    Abstract: Automatic Speech Understanding (ASU) aims at human-like speech interpretation, providing nuanced intent, emotion, sentiment, and content understanding from speech and language (text) content conveyed in speech. Typically, training a robust ASU model relies heavily on acquiring large-scale, high-quality speech and associated transcriptions. However, it is often challenging to collect or use speech… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  9. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  10. arXiv:2404.00471  [pdf, other

    physics.med-ph cs.CV cs.LG eess.IV

    Score-Based Diffusion Models for Photoacoustic Tomography Image Reconstruction

    Authors: Sreemanti Dey, Snigdha Saha, Berthy T. Feng, Manxiu Cui, Laure Delisle, Oscar Leong, Lihong V. Wang, Katherine L. Bouman

    Abstract: Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 5 pages

    Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 2470-2474

  11. arXiv:2310.19113  [pdf, other

    cs.CV cs.AI eess.SP

    Dynamic V2X Autonomous Perception from Road-to-Vehicle Vision

    Authors: Jiayao Tan, Fan Lyu, Linyan Li, Fuyuan Hu, Tingliang Feng, Fenglei Xu, Rui Yao

    Abstract: Vehicle-to-everything (V2X) perception is an innovative technology that enhances vehicle perception accuracy, thereby elevating the security and reliability of autonomous systems. However, existing V2X perception methods focus on static scenes from mainly vehicle-based vision, which is constrained by sensor capabilities and communication loads. To adapt V2X perception models to dynamic scenes, we… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

  12. arXiv:2310.10835  [pdf, other

    eess.IV cs.CV cs.LG

    Provable Probabilistic Imaging using Score-Based Generative Priors

    Authors: Yu Sun, Zihui Wu, Yifan Chen, Berthy T. Feng, Katherine L. Bouman

    Abstract: Estimating high-quality images while also quantifying their uncertainty are two desired features in an image reconstruction algorithm for solving ill-posed inverse problems. In this paper, we propose plug-and-play Monte Carlo (PMC) as a principled framework for characterizing the space of possible solutions to a general inverse problem. PMC is able to incorporate expressive score-based generative… ▽ More

    Submitted 28 August, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  13. arXiv:2310.01867  [pdf, other

    eess.AS cs.SD

    Audio-visual child-adult speaker classification in dyadic interactions

    Authors: Anfeng Xu, Kevin Huang, Tiantian Feng, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Interactions involving children span a wide range of important domains from learning to clinical diagnostic and therapeutic contexts. Automated analyses of such interactions are motivated by the need to seek accurate insights and offer scale and robustness across diverse and wide-ranging conditions. Identifying the speech segments belonging to the child is a critical step in such modeling. Convent… ▽ More

    Submitted 9 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: In review for ICASSP 2024, 5 pages

  14. arXiv:2309.15292  [pdf, other

    cs.LG eess.SP

    Scaling Representation Learning from Ubiquitous ECG with State-Space Models

    Authors: Kleanthis Avramidis, Dominika Kunc, Bartosz Perz, Kranti Adsul, Tiantian Feng, Przemysław Kazienko, Stanisław Saganowski, Shrikanth Narayanan

    Abstract: Ubiquitous sensing from wearable devices in the wild holds promise for enhancing human well-being, from diagnosing clinical conditions and measuring stress to building adaptive health promoting scaffolds. But the large volumes of data therein across heterogeneous contexts pose challenges for conventional supervised learning approaches. Representation Learning from biological signals is an emerging… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Pre-print, currently under review

  15. arXiv:2309.10993  [pdf, other

    cs.SD cs.HC eess.AS

    Directional Source Separation for Robust Speech Recognition on Smart Glasses

    Authors: Tiantian Feng, Ju Lin, Yiteng Huang, Weipeng He, Kaustubh Kalgaonkar, Niko Moritz, Li Wan, Xin Lei, Ming Sun, Frank Seide

    Abstract: Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encounter challenges related to environmental noises, resulting in degradation to speech recognition and speaker change detection. To improve voice quality,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  16. arXiv:2309.08108  [pdf, other

    cs.SD eess.AS

    Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting

    Authors: Tiantian Feng, Shrikanth Narayanan

    Abstract: Significant advances are being made in speech emotion recognition (SER) using deep learning models. Nonetheless, training SER systems remains challenging, requiring both time and costly resources. Like many other machine learning tasks, acquiring datasets for SER requires substantial data annotation efforts, including transcription and labeling. These annotation processes present challenges when a… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Under review

  17. arXiv:2308.12610  [pdf, other

    cs.MM cs.SD eess.AS

    Emotion-Aligned Contrastive Learning Between Images and Music

    Authors: Shanti Stewart, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan

    Abstract: Traditional music search engines rely on retrieval methods that match natural language queries with music metadata. There have been increasing efforts to expand retrieval methods to consider the audio characteristics of music itself, using queries of various modalities including text, video, and speech. While most approaches aim to match general music semantics to the input queries, only a few foc… ▽ More

    Submitted 20 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 4 pages + 1 reference page, 1 figure, 3 tables. Under review for publication

  18. arXiv:2307.16398  [pdf, other

    eess.AS

    Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism

    Authors: Rimita Lahiri, Tiantian Feng, Rajat Hebbar, Catherine Lord, So Hyun Kim, Shrikanth Narayanan

    Abstract: We address the problem of detecting who spoke when in child-inclusive spoken interactions i.e., automatic child-adult speaker classification. Interactions involving children are richly heterogeneous due to developmental differences. The presence of neurodiversity e.g., due to Autism, contributes additional variability. We investigate the impact of additional pre-training with more unlabelled child… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  19. arXiv:2307.04445  [pdf, other

    cs.LG eess.SP

    Learning Behavioral Representations of Routines From Large-scale Unlabeled Wearable Time-series Data Streams using Hawkes Point Process

    Authors: Tiantian Feng, Brandon M Booth, Shrikanth Narayanan

    Abstract: Continuously-worn wearable sensors enable researchers to collect copious amounts of rich bio-behavioral time series recordings of real-life activities of daily living, offering unprecedented opportunities to infer novel human behavior patterns during daily routines. Existing approaches to routine discovery through bio-behavioral data rely either on pre-defined notions of activities or use addition… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: 2023 9th ACM SIGKDD International Workshop on Mining and Learning From Time Series (MiLeTS 2023)

  20. arXiv:2306.07791  [pdf, other

    cs.SD eess.AS

    Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

    Authors: Tiantian Feng, Digbalay Bose, Xuan Shi, Shrikanth Narayanan

    Abstract: Automatic Speech Understanding (ASU) leverages the power of deep learning models for accurate interpretation of human speech, leading to a wide range of speech applications that enrich the human experience. However, training a robust ASU model requires the curation of a large number of speech samples, creating risks for privacy breaches. In this work, we investigate using foundation models to assi… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  21. PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

    Authors: Tiantian Feng, Shrikanth Narayanan

    Abstract: Many recent studies have focused on fine-tuning pre-trained models for speech emotion recognition (SER), resulting in promising performance compared to traditional methods that rely largely on low-level, knowledge-inspired acoustic features. These pre-trained speech models learn general-purpose speech representations using self-supervised or weakly-supervised learning objectives from large-scale d… ▽ More

    Submitted 14 February, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: This work was accepted to the 11th International Conference on Affective Computing and Intelligent Interaction (ACII), 2023

  22. arXiv:2305.14117  [pdf, other

    eess.AS cs.LG

    Understanding Spoken Language Development of Children with ASD Using Pre-trained Speech Embeddings

    Authors: Anfeng Xu, Rajat Hebbar, Rimita Lahiri, Tiantian Feng, Lindsay Butler, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Speech processing techniques are useful for analyzing speech and language development in children with Autism Spectrum Disorder (ASD), who are often varied and delayed in acquiring these skills. Early identification and intervention are crucial, but traditional assessment methodologies such as caregiver reports are not adequate for the requisite behavioral phenotyping. Natural Language Sample (NLS… ▽ More

    Submitted 31 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023, 5 pages

  23. arXiv:2305.11229  [pdf, other

    cs.SD eess.AS

    TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition

    Authors: Tiantian Feng, Rajat Hebbar, Shrikanth Narayanan

    Abstract: Recent studies have explored the use of pre-trained embeddings for speech emotion recognition (SER), achieving comparable performance to conventional methods that rely on low-level knowledge-inspired acoustic features. These embeddings are often generated from models trained on large-scale speech datasets using self-supervised or weakly-supervised learning objectives. Despite the significant advan… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  24. arXiv:2302.12757  [pdf, other

    eess.AS cs.CL cs.SD

    Ensemble knowledge distillation of self-supervised speech models

    Authors: Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-yi Lee

    Abstract: Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerw… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  25. arXiv:2212.09090  [pdf, other

    cs.SD cs.MM eess.AS

    Exploring Workplace Behaviors through Speaking Patterns using Large-scale Multimodal Wearable Recordings: A Study of Healthcare Providers

    Authors: Tiantian Feng, Shrikanth Narayanan

    Abstract: Interpersonal spoken communication is central to human interaction and the exchange of information. Such interactive processes involve not only speech and spoken language but also non-verbal cues such as hand gestures, facial expressions, and nonverbal vocalization, that are used to express feelings and provide feedback. These multimodal communication signals carry a variety of information about t… ▽ More

    Submitted 18 December, 2022; originally announced December 2022.

  26. arXiv:2212.09006  [pdf, other

    cs.SD cs.LG eess.AS

    A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness

    Authors: Tiantian Feng, Rajat Hebbar, Nicholas Mehlman, Xuan Shi, Aditya Kommineni, and Shrikanth Narayanan

    Abstract: Speech-centric machine learning systems have revolutionized many leading domains ranging from transportation and healthcare to education and defense, profoundly changing how people live, work, and interact with each other. However, recent studies have demonstrated that many speech-centric ML systems may need to be considered more trustworthy for broader deployment. Specifically, concerns over priv… ▽ More

    Submitted 16 April, 2023; v1 submitted 17 December, 2022; originally announced December 2022.

    Journal ref: APSIPA Transactions on Signal and Information Processing, vol. 12, no. 3, 2023

  27. arXiv:2211.14806  [pdf, other

    eess.SY

    Efficient Demand Response Location Targeting for Price Spike Mitigation by Exploiting Price-demand Relationship

    Authors: Yufan Zhang, Honglin Wen, Tao Feng, Yize Chen

    Abstract: Demand response (DR) leverages demand-side flexibility, offering a promising approach to enhance market conditions like mitigating wholesale price spikes. However, poorly chosen DR locations can inadvertently increase electricity prices. For that, we introduce a method to rigorously select DR locations and corresponding demand reductions. We formulate a bilevel program where the upper level determ… ▽ More

    Submitted 5 August, 2024; v1 submitted 27 November, 2022; originally announced November 2022.

    Comments: Submitted to Applied Energy

  28. arXiv:2211.09949  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Compressing Transformer-based self-supervised models for speech processing

    Authors: Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, Kuang-Ming Chen, Tzu-hsun Feng, Hung-yi Lee, Hao Tang

    Abstract: Despite the success of Transformers in self- supervised learning with applications to various downstream tasks, the computational cost of training and inference remains a major challenge for applying these models to a wide spectrum of devices. Several isolated attempts have been made to compress Transformers, but the settings and metrics are different across studies. Trade-off at various compressi… ▽ More

    Submitted 26 January, 2024; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Submitted to IEEE Transactions on Audio, Speech and Language Processing (TASLP)

  29. arXiv:2210.15826  [pdf, other

    eess.SP cs.HC

    Multimodal Estimation of Change Points of Physiological Arousal in Drivers

    Authors: Kleanthis Avramidis, Tiantian Feng, Digbalay Bose, Shrikanth Narayanan

    Abstract: Detecting unsafe driving states, such as stress, drowsiness, and fatigue, is an important component of ensuring driving safety and an essential prerequisite for automatic intervention systems in vehicles. These concerning conditions are primarily connected to the driver's low or high arousal levels. In this study, we describe a framework for processing multimodal physiological time-series from wea… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 tables, 4 figures

  30. arXiv:2210.15707  [pdf, other

    cs.SD cs.DC eess.AS

    FedAudio: A Federated Learning Benchmark for Audio Tasks

    Authors: Tuo Zhang, Tiantian Feng, Samiul Alam, Sunwoo Lee, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr

    Abstract: Federated learning (FL) has gained substantial attention in recent years due to the data privacy concerns related to the pervasiveness of consumer devices that continuously collect data from users. While a number of FL benchmarks have been developed to facilitate FL research, none of them include audio data and audio-related tasks. In this paper, we fill this critical gap by introducing a new FL b… ▽ More

    Submitted 8 February, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

  31. arXiv:2210.08634  [pdf, other

    cs.CL cs.SD eess.AS

    SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

    Authors: Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee

    Abstract: We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency. The challenge builds upon the SUPERB benchmark and implements metrics to measure the computation requirements of self-supervised learning (SSL) representation and to evaluate its generalizability and performance across the diverse SUPERB… ▽ More

    Submitted 29 October, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: Accepted by 2022 SLT Workshop

  32. User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning

    Authors: Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan

    Abstract: Many existing privacy-enhanced speech emotion recognition (SER) frameworks focus on perturbing the original speech data through adversarial training within a centralized machine learning setup. However, this privacy protection scheme can fail since the adversary can still access the perturbed data. In recent years, distributed learning algorithms, especially federated learning (FL), have gained po… ▽ More

    Submitted 16 May, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Journal ref: Proc. Interspeech 2022

  33. arXiv:2203.08810  [pdf, ps, other

    eess.AS cs.CR cs.LG cs.SD

    Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling

    Authors: Tiantian Feng, Shrikanth Narayanan

    Abstract: Speech Emotion Recognition (SER) application is frequently associated with privacy concerns as it often acquires and transmits speech data at the client-side to remote cloud platforms for further processing. These speech data can reveal not only speech content and affective information but the speaker's identity, demographic traits, and health status. Federated learning (FL) is a distributed machi… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: This paper was submitted to Insterspeech 2022 for review

    Journal ref: Proc. Interspeech 2022

  34. arXiv:2105.08630  [pdf, other

    eess.IV cs.CV cs.LG

    Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, Jin-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu , et al. (13 additional authors not shown)

    Abstract: Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based d… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: text overlap with arXiv:2105.07809

  35. Visual Vibration Tomography: Estimating Interior Material Properties from Monocular Video

    Authors: Berthy T. Feng, Alexander C. Ogren, Chiara Daraio, Katherine L. Bouman

    Abstract: An object's interior material properties, while invisible to the human eye, determine motion observed on its surface. We propose an approach that estimates heterogeneous material properties of an object from a monocular video of its surface vibrations. Specifically, we show how to estimate Young's modulus and density throughout a 3D object with known geometry. Knowledge of how these values change… ▽ More

    Submitted 23 April, 2023; v1 submitted 6 April, 2021; originally announced April 2021.

  36. arXiv:2101.08918  [pdf, other

    cs.IT eess.SP

    Performance Analysis for Cache-enabled Cellular Networks with Cooperative Transmission

    Authors: Tianming Feng, Shuo Shi, Shushi Gu, Ning Zhang, Wei Xiang, Xuemai Gu

    Abstract: The large amount of deployed smart devices put tremendous traffic pressure on networks. Caching at the edge has been widely studied as a promising technique to solve this problem. To further improve the successful transmission probability (STP) of cache-enabled cellular networks (CEN), we combine the cooperative transmission technique with CEN and propose a novel transmission scheme. Local channel… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: arXiv admin note: text overlap with arXiv:2101.08669

  37. arXiv:2004.11332  [pdf, ps, other

    cs.IT eess.SP

    UAV-Enabled Data Collection for Wireless Sensor Networks with Distributed Beamforming

    Authors: Tianxin Feng, Lifeng Xie, Jianping Yao, Jie Xu

    Abstract: This paper studies an unmanned aerial vehicle (UAV)-enabled wireless sensor network, in which one UAV flies in the sky to collect the data transmitted from a set of ground nodes (GNs) via distributed beamforming. We consider two scenarios with delay-tolerant and delay-sensitive applications, in which the GNs send the common/shared messages to the UAV via adaptive- and fixed-rate transmissions, res… ▽ More

    Submitted 7 August, 2021; v1 submitted 23 April, 2020; originally announced April 2020.

    Comments: Double-column, 15 pages, 8 figures, 4 tables. Accepted for publication in the IEEE Transactions on Wireless Communications. It overlaps with the former version (arXiv:2004.11332)

  38. arXiv:2003.08474  [pdf, other

    eess.SP cs.CY cs.HC stat.AP

    TILES-2018, a longitudinal physiologic and behavioral data set of hospital workers

    Authors: Karel Mundnich, Brandon M. Booth, Michelle L'Hommedieu, Tiantian Feng, Benjamin Girault, Justin L'Hommedieu, Mackenzie Wildman, Sophia Skaaden, Amrutha Nadarajan, Jennifer L. Villatte, Tiago H. Falk, Kristina Lerman, Emilio Ferrara, Shrikanth Narayanan

    Abstract: We present a novel longitudinal multimodal corpus of physiological and behavioral data collected from direct clinical providers in a hospital workplace. We designed the study to investigate the use of off-the-shelf wearable and environmental sensors to understand individual-specific constructs such as job performance, interpersonal interaction, and well-being of hospital workers over time in their… ▽ More

    Submitted 18 December, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: 57 pages, 9 figures, journal paper

    Journal ref: Sci Data 7, 354 (2020)

  39. arXiv:2003.03574  [pdf, ps, other

    cs.IT eess.SP

    Outage Probability Minimization for UAV-Enabled Data Collection with Distributed Beamforming

    Authors: Tianxin Feng, Lifeng Xie, Jianping Yao, Jie Xu

    Abstract: This paper studies an unmanned aerial vehicle (UAV)-enabled wireless sensor network, in which one UAV flies in the sky to collect the data transmitted from a set of sensors via distributed beamforming. We consider the delay-sensitive application scenario, in which the sensors transmit the common/shared messages by using fixed data rates and adaptive transmit powers. Under this setup, we jointly op… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

    Comments: 6 pages, 4 figures, ICC 2020 workshop

  40. arXiv:2003.00127  [pdf

    eess.IV

    Time of arrival imaging: The proof of concept for a novel medical imaging modality

    Authors: Tao Feng

    Abstract: It has been shown that with the use of ultra-wideband (UWB) electromagnetic signal and time of arrival (ToA) principle, it is possible to locate medical implants given the permittivity distribution of the body. We propose a new imaging modality using the reverse process to acquire permittivity distributions as a surrogate of human anatomy. In the proposed systems, the locations of the signal sourc… ▽ More

    Submitted 28 February, 2020; originally announced March 2020.

  41. arXiv:1906.08889  [pdf, other

    cs.RO cs.CV eess.IV

    SGANVO: Unsupervised Deep Visual Odometry and Depth Estimation with Stacked Generative Adversarial Networks

    Authors: Tuo Feng, Dongbing Gu

    Abstract: Recently end-to-end unsupervised deep learning methods have achieved an effect beyond geometric methods for visual depth and ego-motion estimation tasks. These data-based learning methods perform more robustly and accurately in some of the challenging scenes. The encoder-decoder network has been widely used in the depth estimation and the RCNN has brought significant improvements in the ego-motion… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

    Comments: 7 pages, 4 figures,

    Report number: ras.ral.19-0181.628f4a7b

  42. arXiv:1903.08860  [pdf, ps, other

    cs.IT eess.SP

    Cognitive Wireless Power Transfer in the Presence of Reactive Primary Communication User

    Authors: Tianxin Feng, Ganggang Ma, Jie Xu

    Abstract: This paper studies a cognitive or secondary multi-antenna wireless power transfer (WPT) system over a multi-carrier channel, which shares the same spectrum with a primary wireless information transfer (WIT) system that employs adaptive water-filling power allocation. By controlling the transmit energy beamforming over sub-carriers (SCs), the secondary energy transmitter (S-ET) can directly charge… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.