Zum Hauptinhalt springen

Showing 1–50 of 159 results for author: Wu, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.09315  [pdf, other

    eess.IV cs.CV

    Unpaired Volumetric Harmonization of Brain MRI with Conditional Latent Diffusion

    Authors: Mengqi Wu, Minhui Yu, Shuaiming Jing, Pew-Thian Yap, Zhengwu Zhang, Mingxia Liu

    Abstract: Multi-site structural MRI is increasingly used in neuroimaging studies to diversify subject cohorts. However, combining MR images acquired from various sites/centers may introduce site-related non-biological variations. Retrospective image harmonization helps address this issue, but current methods usually perform harmonization on pre-extracted hand-crafted radiomic features, limiting downstream a… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  2. arXiv:2407.14355  [pdf, other

    cs.SD eess.AS

    Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

    Authors: Xuenan Xu, Pingyue Zhang, Ming Yan, Ji Zhang, Mengyue Wu

    Abstract: Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute descriptions. We propose a list of sound attributes and leverage large language model's domain knowledge to generate detailed attribute descriptions for each c… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  3. arXiv:2407.14329  [pdf, other

    cs.SD eess.AS

    Efficient Audio Captioning with Encoder-Level Knowledge Distillation

    Authors: Xuenan Xu, Haohe Liu, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Significant improvement has been achieved in automated audio captioning (AAC) with recent models. However, these models have become increasingly large as their performance is enhanced. In this work, we propose a knowledge distillation (KD) framework for AAC. Our analysis shows that in the encoder-decoder based AAC models, it is more effective to distill knowledge into the encoder as compared with… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  4. arXiv:2407.13198  [pdf, other

    cs.SD eess.AS

    DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

    Authors: Baihan Li, Zeyu Xie, Xuenan Xu, Yiwei Guo, Ming Yan, Ji Zhang, Kai Yu, Mengyue Wu

    Abstract: Audio generation has attracted significant attention. Despite remarkable enhancement in audio quality, existing models overlook diversity evaluation. This is partially due to the lack of a systematic sound class diversity framework and a matching dataset. To address these issues, we propose DiveSound, a novel framework for constructing multimodal datasets with in-class diversified taxonomy, assist… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2407.02869  [pdf, other

    cs.SD eess.AS

    PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

    Authors: Zeyu Xie, Xuenan Xu, Zhizheng Wu, Mengyue Wu

    Abstract: Recently, audio generation tasks have attracted considerable research interests. Precise temporal controllability is essential to integrate audio generation with real applications. In this work, we propose a temporal controlled audio generation framework, PicoAudio. PicoAudio integrates temporal information to guide audio generation through tailored model design. It leverages data crawling, segmen… ▽ More

    Submitted 17 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 68Txx ACM Class: I.2

  6. arXiv:2407.02857  [pdf, other

    cs.SD eess.AS

    AudioTime: A Temporally-aligned Audio-text Benchmark Dataset

    Authors: Zeyu Xie, Xuenan Xu, Zhizheng Wu, Mengyue Wu

    Abstract: Recent advancements in audio generation have enabled the creation of high-fidelity audio clips from free-form textual descriptions. However, temporal relationships, a critical feature for audio content, are currently underrepresented in mainstream models, resulting in an imprecise temporal controllability. Specifically, users cannot accurately control the timestamps of sound events using free-form… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 68Txx ACM Class: I.2

  7. arXiv:2407.02052  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for The ICMC-ASR Challenge

    Authors: Minghui Wu, Luzhen Xu, Jie Zhang, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang

    Abstract: This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position,… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ICASSP 2024

  8. arXiv:2406.14092  [pdf, other

    cs.CL eess.AS

    Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models

    Authors: Jing Xu, Minglin Wu, Xixin Wu, Helen Meng

    Abstract: Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose ad… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  9. arXiv:2406.08052  [pdf, other

    cs.SD eess.AS

    FakeSound: Deepfake General Audio Detection

    Authors: Zeyu Xie, Baihan Li, Xuenan Xu, Zheng Liang, Kai Yu, Mengyue Wu

    Abstract: With the advancement of audio generation, generative models can produce highly realistic audios. However, the proliferation of deepfake general audio can pose negative consequences. Therefore, we propose a new task, deepfake general audio detection, which aims to identify whether audio content is manipulated and to locate deepfake regions. Leveraging an automated manipulation pipeline, a dataset n… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

    MSC Class: 68Txx ACM Class: I.2

  10. arXiv:2405.19889  [pdf, other

    eess.SP cs.IT cs.LG cs.MM

    Deep Joint Semantic Coding and Beamforming for Near-Space Airship-Borne Massive MIMO Network

    Authors: Minghui Wu, Zhen Gao, Zhaocheng Wang, Dusit Niyato, George K. Karagiannidis, Sheng Chen

    Abstract: Near-space airship-borne communication network is recognized to be an indispensable component of the future integrated ground-air-space network thanks to airships' advantage of long-term residency at stratospheric altitudes, but it urgently needs reliable and efficient Airship-to-X link. To improve the transmission efficiency and capacity, this paper proposes to integrate semantic communication wi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Major Revision by IEEE JSAC

  11. arXiv:2405.16716  [pdf, other

    cs.GT cs.MA eess.SY math.DS

    Adaptive Incentive Design with Learning Agents

    Authors: Chinmay Maheshwari, Kshitij Kulkarni, Manxi Wu, Shankar Sastry

    Abstract: How can the system operator learn an incentive mechanism that achieves social optimality based on limited information about the agents' behavior, who are dynamically updating their strategies? To answer this question, we propose an \emph{adaptive} incentive mechanism. This mechanism updates the incentives of agents based on the feedback of each agent's externality, evaluated as the difference betw… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 33 pages

  12. arXiv:2405.02504  [pdf, other

    eess.IV cs.CV

    Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI

    Authors: Minhui Yu, Mengqi Wu, Ling Yue, Andrea Bozoki, Mingxia Liu

    Abstract: Magnetic resonance imaging (MRI) and positron emission tomography (PET) are increasingly used in multimodal analysis of neurodegenerative disorders. While MRI is broadly utilized in clinical settings, PET is less accessible. Many studies have attempted to use deep generative models to synthesize PET from MRI scans. However, they often suffer from unstable training and inadequately preserve brain f… ▽ More

    Submitted 8 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  13. arXiv:2405.00233  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

    Authors: Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these chal… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Demo and code: https://haoheliu.github.io/SemantiCodec/

  14. arXiv:2405.00075  [pdf, ps, other

    eess.IV

    Charting the Path Forward: CT Image Quality Assessment -- An In-Depth Review

    Authors: Siyi Xun, Qiaoyu Li, Xiaohong Liu, Guangtao Zhai, Mingxiang Wu, Tao Tan

    Abstract: Computed Tomography (CT) is a frequently utilized imaging technology that is employed in the clinical diagnosis of many disorders. However, clinical diagnosis, data storage, and management are posed huge challenges by a huge volume of non-homogeneous CT data in terms of imaging quality. As a result, the quality assessment of CT images is a crucial problem that demands consideration. The history, a… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  15. arXiv:2404.16619  [pdf, other

    cs.SD eess.AS

    The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge

    Authors: Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu

    Abstract: This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted in Grand Challenge of ICASSP 2024

  16. arXiv:2403.15432  [pdf, other

    eess.SP cs.AI cs.HC cs.LG cs.RO

    BRIEDGE: EEG-Adaptive Edge AI for Multi-Brain to Multi-Robot Interaction

    Authors: Jinhui Ouyang, Mingzhu Wu, Xinglin Li, Hanhui Deng, Di Wu

    Abstract: Recent advances in EEG-based BCI technologies have revealed the potential of brain-to-robot collaboration through the integration of sensing, computing, communication, and control. In this paper, we present BRIEDGE as an end-to-end system for multi-brain to multi-robot interaction through an EEG-adaptive neural network and an encoding-decoding communication framework, as illustrated in Fig.1. As d… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  17. arXiv:2403.08162  [pdf, other

    eess.IV cs.CV cs.LG

    Iterative Learning for Joint Image Denoising and Motion Artifact Correction of 3D Brain MRI

    Authors: Lintao Zhang, Mengqi Wu, Lihong Wang, David C. Steffens, Guy G. Potter, Mingxia Liu

    Abstract: Image noise and motion artifacts greatly affect the quality of brain MRI and negatively influence downstream medical image analysis. Previous studies often focus on 2D methods that process each volumetric MR image slice-by-slice, thus losing important 3D anatomical information. Additionally, these studies generally treat image denoising and artifact correction as two standalone tasks, without cons… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  18. arXiv:2403.04594  [pdf, other

    cs.SD eess.AS

    A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

    Authors: Xuenan Xu, Xiaohang Xu, Zeyu Xie, Pingyue Zhang, Mengyue Wu, Kai Yu

    Abstract: Recently, there has been an increasing focus on audio-text cross-modal learning. However, most of the existing audio-text datasets contain only simple descriptions of sound events. Compared with classification labels, the advantages of such descriptions are significantly limited. In this paper, we first analyze the detailed information that human descriptions of audio may contain beyond sound even… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  19. PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station

    Authors: Cunyi Yin, Xiren Miao, Jing Chen, Hao Jiang, Jianfei Yang, Yunjiao Zhou, Min Wu, Zhenghua Chen

    Abstract: Safety monitoring of power operations in power stations is crucial for preventing accidents and ensuring stable power supply. However, conventional methods such as wearable devices and video surveillance have limitations such as high cost, dependence on light, and visual blind spots. WiFi-based human pose estimation is a suitable method for monitoring power operations due to its low cost, device-f… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  20. arXiv:2403.01278  [pdf, other

    cs.SD eess.AS

    Enhancing Audio Generation Diversity with Visual Information

    Authors: Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu

    Abstract: Audio and sound generation has garnered significant attention in recent years, with a primary focus on improving the quality of generated audios. However, there has been limited research on enhancing the diversity of generated audio, particularly when it comes to audio generation within specific categories. Current models tend to produce homogeneous audio samples within a category. This work aims… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    ACM Class: I.2

  21. arXiv:2402.15985  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Phonetic and Lexical Discovery of a Canine Language using HuBERT

    Authors: Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

    Abstract: This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization. We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels and the identifica… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  22. arXiv:2402.13523  [pdf, other

    eess.SP cs.LG q-bio.NC

    Balancing Spectral, Temporal and Spatial Information for EEG-based Alzheimer's Disease Classification

    Authors: Stephan Goerttler, Fei He, Min Wu

    Abstract: The prospect of future treatment warrants the development of cost-effective screening for Alzheimer's disease (AD). A promising candidate in this regard is electroencephalography (EEG), as it is one of the most economic imaging modalities. Recent efforts in EEG analysis have shifted towards leveraging spatial information, employing novel frameworks such as graph signal processing or graph neural n… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 4 pages, 3 figures, conference paper

  23. arXiv:2402.12785  [pdf, other

    eess.SP q-bio.NC stat.ME

    Stochastic Graph Heat Modelling for Diffusion-based Connectivity Retrieval

    Authors: Stephan Goerttler, Fei He, Min Wu

    Abstract: Heat diffusion describes the process by which heat flows from areas with higher temperatures to ones with lower temperatures. This concept was previously adapted to graph structures, whereby heat flows between nodes of a graph depending on the graph topology. Here, we combine the graph heat equation with the stochastic heat equation, which ultimately yields a model for multivariate time signals on… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 4 pages, 1 figure, conference paper

  24. arXiv:2402.12701  [pdf, other

    eess.IV cs.CV

    wmh_seg: Transformer based U-Net for Robust and Automatic White Matter Hyperintensity Segmentation across 1.5T, 3T and 7T

    Authors: Jinghang Li, Tales Santini, Yuanzhe Huang, Joseph M. Mettenburg, Tamer S. Ibrahim, Howard J. Aizenstein, Minjie Wu

    Abstract: White matter hyperintensity (WMH) remains the top imaging biomarker for neurodegenerative diseases. Robust and accurate segmentation of WMH holds paramount significance for neuroimaging studies. The growing shift from 3T to 7T MRI necessitates robust tools for harmonized segmentation across field strengths and artifacts. Recent deep learning models exhibit promise in WMH segmentation but still fac… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  25. arXiv:2402.06875  [pdf, other

    eess.IV cs.CV

    Disentangled Latent Energy-Based Style Translation: An Image-Level Structural MRI Harmonization Framework

    Authors: Mengqi Wu, Lintao Zhang, Pew-Thian Yap, Hongtu Zhu, Mingxia Liu

    Abstract: Brain magnetic resonance imaging (MRI) has been extensively employed across clinical and research fields, but often exhibits sensitivity to site effects arising from non-biological variations such as differences in field strength and scanner vendors. Numerous retrospective MRI harmonization techniques have demonstrated encouraging outcomes in reducing the site effects at the image level. However,… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  26. arXiv:2402.03695  [pdf, other

    eess.IV cs.CV

    ConUNETR: A Conditional Transformer Network for 3D Micro-CT Embryonic Cartilage Segmentation

    Authors: Nishchal Sapkota, Yejia Zhang, Susan M. Motch Perrine, Yuhan Hsi, Sirui Li, Meng Wu, Greg Holmes, Abdul R. Abdulai, Ethylin W. Jabs, Joan T. Richtsmeier, Danny Z Chen

    Abstract: Studying the morphological development of cartilaginous and osseous structures is critical to the early detection of life-threatening skeletal dysmorphology. Embryonic cartilage undergoes rapid structural changes within hours, introducing biological variations and morphological shifts that limit the generalization of deep learning-based segmentation models that infer across multiple embryonic age… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Published in ISBI 2024

  27. arXiv:2401.16844  [pdf, other

    cs.GT cs.CY cs.MA econ.EM eess.SY

    Congestion Pricing for Efficiency and Equity: Theory and Applications to the San Francisco Bay Area

    Authors: Chinmay Maheshwari, Kshitij Kulkarni, Druv Pai, Jiarui Yang, Manxi Wu, Shankar Sastry

    Abstract: Congestion pricing, while adopted by many cities to alleviate traffic congestion, raises concerns about widening socioeconomic disparities due to its disproportionate impact on low-income travelers. In this study, we address this concern by proposing a new class of congestion pricing schemes that not only minimize congestion levels but also incorporate an equity objective to reduce cost disparitie… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 42 pages, 11 figures

    MSC Class: 91A07; 91A14; 91A68; 91A90

  28. arXiv:2401.14717  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

    Authors: Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

    Abstract: We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: To appear in IEEE ICASSP 2024

  29. arXiv:2401.02584  [pdf, other

    cs.SD eess.AS

    Towards Weakly Supervised Text-to-Audio Grounding

    Authors: Xuenan Xu, Ziyang Ma, Mengyue Wu, Kai Yu

    Abstract: Text-to-audio grounding (TAG) task aims to predict the onsets and offsets of sound events described by natural language. This task can facilitate applications such as multimodal information retrieval. This paper focuses on weakly-supervised text-to-audio grounding (WSTAG), where frame-level annotations of sound events are unavailable, and only the caption of a whole audio clip can be utilized for… ▽ More

    Submitted 17 July, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

  30. arXiv:2312.16422  [pdf, other

    eess.AS cs.SD

    Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

    Authors: Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, Jun Yang

    Abstract: Environment shifts and conflicts present significant challenges for learning-based sound event localization and detection (SELD) methods. SELD systems, when trained in particular acoustic settings, often show restricted generalization capabilities for diverse acoustic environments. Furthermore, obtaining annotated samples for spatial sound events is notably costly. Deploying a SELD system in a new… ▽ More

    Submitted 22 August, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: 14 pages, 11 figures, accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  31. arXiv:2312.10343  [pdf, other

    eess.SP cs.AR cs.LG cs.NE

    In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar

    Authors: Yang Sui, Minning Zhu, Lingyi Huang, Chung-Tse Michael Wu, Bo Yuan

    Abstract: Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank dec… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  32. arXiv:2312.06668  [pdf

    cs.CL cs.SD eess.AS

    Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

    Authors: Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi

    Abstract: Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted to ASRU 2023

  33. arXiv:2312.03371  [pdf, ps, other

    eess.SP

    Understanding Concepts in Graph Signal Processing for Neurophysiological Signal Analysis

    Authors: Stephan Goerttler, Fei He, Min Wu

    Abstract: Multivariate signals, which are measured simultaneously over time and acquired by sensor networks, are becoming increasingly common. The emerging field of graph signal processing (GSP) promises to analyse spectral characteristics of these multivariate signals, while at the same time taking the spatial structure between the time signals into account. A central idea in GSP is the graph Fourier trans… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 18 pages, 7 figures, book chapter

  34. arXiv:2312.01566  [pdf, other

    physics.med-ph eess.IV

    Coronary Atherosclerotic Plaque Characterization with Photon-counting CT: a Simulation-based Feasibility Study

    Authors: Mengzhou Li, Mingye Wu, Jed Pack, Pengwei Wu, Bruno De Man, Adam Wang, Koen Nieman, Ge Wang

    Abstract: Recent development of photon-counting CT (PCCT) brings great opportunities for plaque characterization with much-improved spatial resolution and spectral imaging capability. While existing coronary plaque PCCT imaging results are based on detectors made of CZT or CdTe materials, deep-silicon photon-counting detectors have unique performance characteristics and promise distinct imaging capabilities… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: 13 figures, 5 tables

  35. arXiv:2311.08840  [pdf, other

    eess.SY

    An MRL-Based Design Solution for RIS-Assisted MU-MIMO Wireless System under Time-Varying Channels

    Authors: Meng-Qian Alexander Wu, Tzu-Hsien Sang, Luisa Schuhmacher, Ming-Jie Guo, Khodr Hammoud, Sofie Pollin

    Abstract: Utilizing Deep Reinforcement Learning (DRL) for Reconfigurable Intelligent Surface (RIS) assisted wireless communication has been extensively researched. However, existing DRL methods either act as a simple optimizer or only solve problems with concurrent Channel State Information (CSI) represented in the training data set. Consequently, solutions for RIS-assisted wireless communication systems un… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: To be published in proceedings of the 2023 IEEE Conference on Global Communications (GLOBECOM)

  36. arXiv:2310.14165  [pdf, other

    cs.LG cs.AI eess.SP

    Graph Convolutional Network with Connectivity Uncertainty for EEG-based Emotion Recognition

    Authors: Hongxiang Gao, Xiangyao Wang, Zhenghua Chen, Min Wu, Zhipeng Cai, Lulu Zhao, Jianqing Li, Chengyu Liu

    Abstract: Automatic emotion recognition based on multichannel Electroencephalography (EEG) holds great potential in advancing human-computer interaction. However, several significant challenges persist in existing research on algorithmic emotion recognition. These challenges include the need for a robust model to effectively learn discriminative node attributes over long paths, the exploration of ambiguous… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: 10 pages

  37. arXiv:2309.13086  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Towards Lexical Analysis of Dog Vocalizations via Online Videos

    Authors: Yufei Wang, Chunhao Zhang, Jieyi Huang, Mengyue Wu, Kenny Zhu

    Abstract: Deciphering the semantics of animal language has been a grand challenge. This study presents a data-driven investigation into the semantics of dog vocalizations via correlating different sound types with consistent semantics. We first present a new dataset of Shiba Inu sounds, along with contextual information such as location and activity, collected from YouTube with a well-constructed pipeline.… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  38. arXiv:2309.13085  [pdf, other

    cs.SD cs.LG eess.AS

    Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners

    Authors: Jieyi Huang, Chunhao Zhang, Yufei Wang, Mengyue Wu, Kenny Zhu

    Abstract: How hosts language influence their pets' vocalization is an interesting yet underexplored problem. This paper presents a preliminary investigation into the possible correlation between domestic dog vocal expressions and their human host's language environment. We first present a new dataset of Shiba Inu dog vocals from YouTube, which provides 7500 clean sound clips, including their contextual info… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  39. arXiv:2309.11500  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    A Large-scale Dataset for Audio-Language Representation Learning

    Authors: Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie

    Abstract: The AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets. However, in the audio representation learning community, the present audio-language datasets suffer from limitations such as insufficient volume, simplistic content, and arduous collection procedures. To tackle these challenges, we present an innovative and automatic a… ▽ More

    Submitted 3 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  40. arXiv:2308.08847  [pdf, other

    eess.AS cs.SD

    META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection

    Authors: Jinbo Hu, Yin Cao, Ming Wu, Feiran Yang, Ziying Yu, Wenwu Wang, Mark D. Plumbley, Jun Yang

    Abstract: For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such as different sizes of rooms, different reverberation times, and different background noise, may be reasons for a learning-based system to fail. On the… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Submitted to DCASE 2023 Workshop

  41. arXiv:2308.06285  [pdf, other

    cs.HC eess.IV

    An Integrated Visual Analytics System for Studying Clinical Carotid Artery Plaques

    Authors: Chaoqing Xu, Zhentao Zheng, Yiting Fu, Baofeng Chang, Legao Chen, Minghui Wu, Mingli Song, Jinsong Jiang

    Abstract: Carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery p… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  42. arXiv:2307.07542  [pdf, other

    eess.SP cs.AI cs.LG

    Source-Free Domain Adaptation with Temporal Imputation for Time Series Data

    Authors: Mohamed Ragab, Emadeldeen Eldele, Min Wu, Chuan-Sheng Foo, Xiaoli Li, Zhenghua Chen

    Abstract: Source-free domain adaptation (SFDA) aims to adapt a pretrained model from a labeled source domain to an unlabeled target domain without access to the source domain data, preserving source domain privacy. Despite its prevalence in visual applications, SFDA is largely unexplored in time series applications. The existing SFDA methods that are mainly designed for visual applications may fail to handl… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted in KDD'23

  43. arXiv:2307.03942  [pdf, ps, other

    eess.IV cs.CV

    Ariadne's Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images

    Authors: Yi Zhong, Mengqiu Xu, Kongming Liang, Kaixin Chen, Ming Wu

    Abstract: Segmentation of the infected areas of the lung is essential for quantifying the severity of lung disease like pulmonary infections. Existing medical image segmentation methods are almost uni-modal methods based on image. However, these image-only methods tend to produce inaccurate results unless trained with large amounts of annotated data. To overcome this challenge, we propose a language-driven… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Provisional Acceptance by MICCAI 2023

  44. arXiv:2306.16036  [pdf, other

    eess.IV cs.CV

    A Cascaded Approach for ultraly High Performance Lesion Detection and False Positive Removal in Liver CT Scans

    Authors: Fakai Wang, Chi-Tung Cheng, Chien-Wei Peng, Ke Yan, Min Wu, Le Lu, Chien-Hung Liao, Ling Zhang

    Abstract: Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  45. arXiv:2306.10090  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Improving Audio Caption Fluency with Automatic Error Correction

    Authors: Hanxue Zhang, Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu

    Abstract: Automated audio captioning (AAC) is an important cross-modality translation task, aiming at generating descriptions for audio clips. However, captions generated by previous AAC models have faced ``false-repetition'' errors due to the training objective. In such scenarios, we propose a new task of AAC error correction and hope to reduce such errors by post-processing AAC outputs. To tackle this pro… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted by NCMMSC 2022

  46. arXiv:2306.08258  [pdf, other

    eess.SY

    Transmission and Distribution Coordination for DER-rich Energy Markets: A Parametric Programming Approach

    Authors: Mohammad Mousavi, Meng Wu

    Abstract: In this paper, a framework is proposed to coordinate the operation of the independent system operator (ISO) and distribution system operator (DSO). The framework is compatible with current practice of the U.S. wholesale market to enable massive distributed energy resources (DERs) to participate in the wholesale market. The DSO builds a bid-in cost function to be submitted to the ISO market through… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: 10 pages

  47. An Attentive-based Generative Model for Medical Image Synthesis

    Authors: Jiayuan Wang, Q. M. Jonathan Wu, Farhad Pourpanah

    Abstract: Magnetic resonance (MR) and computer tomography (CT) imaging are valuable tools for diagnosing diseases and planning treatment. However, limitations such as radiation exposure and cost can restrict access to certain imaging modalities. To address this issue, medical image synthesis can generate one modality from another, but many existing models struggle with high-quality image synthesis when mult… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  48. Enhance Temporal Relations in Audio Captioning with Sound Event Detection

    Authors: Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu

    Abstract: Automated audio captioning aims at generating natural language descriptions for given audio clips, not only detecting and classifying sounds, but also summarizing the relationships between audio events. Recent research advances in audio captioning have introduced additional guidance to improve the accuracy of audio events in generated sentences. However, temporal relations between audio events hav… ▽ More

    Submitted 18 July, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Interspeech 2023

  49. arXiv:2306.00265  [pdf, other

    cs.LG cs.AI cs.CV eess.IV stat.ML

    Doubly Robust Self-Training

    Authors: Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao

    Abstract: Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly robust self-training, a novel semi-supervised algorithm that provabl… ▽ More

    Submitted 2 November, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

  50. arXiv:2305.12553  [pdf, other

    cs.GT cs.AI cs.MA eess.SY math.DS

    Markov $α$-Potential Games

    Authors: Xin Guo, Xinyu Li, Chinmay Maheshwari, Shankar Sastry, Manxi Wu

    Abstract: This paper proposes a new framework of Markov $α$-potential games to study Markov games. In this new framework, Markov games are shown to be Markov $α$-potential games, and the existence of an associated $α$-potential function is established. Any optimizer of an $α$-potential function is shown to be an $α$-stationary NE. Two important classes of practically significant Markov games, Markov congest… ▽ More

    Submitted 9 March, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: 32 pages, 3 figures

    MSC Class: 91A68; 91A50; 91A15; 91A14; 91A10