Zum Hauptinhalt springen

Showing 1–50 of 145 results for author: Wu, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.17252  [pdf, other

    eess.SP

    A Homogeneous Graph Neural Network for Precoding and Power Allocation in Scalable Wireless Networks

    Authors: Mingjun Sun, Zeng Li, Shaochuan Wu, Yuanwei Liu, Guoyu Li, Tong Zhang

    Abstract: Deep learning is widely used in wireless communications but struggles with fixed neural network sizes, which limit their adaptability in environments where the number of users and antennas varies. To overcome this, this paper introduced a generalization strategy for precoding and power allocation in scalable wireless networks. Initially, we employ an innovative approach to abstract the wireless ne… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: This work is submitted to IEEE for possible publication

  2. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg , et al. (18 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  3. arXiv:2408.12329  [pdf, ps, other

    cs.IT eess.SP

    Asynchronous Cell-Free Massive MIMO-OFDM: Mixed Coherent and Non-Coherent Transmissions

    Authors: Guoyu Li, Shaochuan Wu, Changsheng You, Wenbin Zhang, Guanyu Shang

    Abstract: In this letter, we analyze the performance of mixed coherent and non-coherent transmissions approach, which can improve the performance of cell-free multiple-input multiple-output orthogonal frequency division multiplexing (CF mMIMO-OFDM) systems under asynchronous reception. To this end, we first obtain the achievable downlink sum-rate for the mixed coherent and non-coherent transmissions, and th… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: This work is submitted to IEEE for possible publication

  4. arXiv:2407.16564  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning

    Authors: Fang-Duo Tsai, Shih-Lun Wu, Haven Kim, Bo-Yu Chen, Hao-Chung Cheng, Yi-Hsuan Yang

    Abstract: Text-to-music models allow users to generate nearly realistic musical audio with textual commands. However, editing music audios remains challenging due to the conflicting desiderata of performing fine-grained alterations on the audio while maintaining a simple user interface. To address this challenge, we propose Audio Prompt Adapter (or AP-Adapter), a lightweight addition to pretrained text-to-m… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by the 25th International Society for Music Information Retrieval (ISMIR)

  5. arXiv:2407.14003  [pdf, other

    stat.ML cs.LG eess.IV stat.ME

    Time Series Generative Learning with Application to Brain Imaging Analysis

    Authors: Zhenghao Li, Sanyou Wu, Long Feng

    Abstract: This paper focuses on the analysis of sequential image data, particularly brain imaging data such as MRI, fMRI, CT, with the motivation of understanding the brain aging process and neurodegenerative diseases. To achieve this goal, we investigate image generation in a time series context. Specifically, we formulate a min-max problem derived from the $f$-divergence between neighboring pairs to learn… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 45 pages

  6. arXiv:2407.11094  [pdf, other

    stat.ME eess.SP stat.ML

    Robust Score-Based Quickest Change Detection

    Authors: Sean Moushegian, Suya Wu, Enmao Diao, Jie Ding, Taposh Banerjee, Vahid Tarokh

    Abstract: Methods in the field of quickest change detection rapidly detect in real-time a change in the data-generating distribution of an online data stream. Existing methods have been able to detect this change point when the densities of the pre- and post-change distributions are known. Recent work has extended these results to the case where the pre- and post-change distributions are known only by their… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.05091

  7. arXiv:2407.09562  [pdf, other

    cs.CV eess.IV

    Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation

    Authors: Qiang Tong, Jinrui Wang, Wenshuang Yang, Songtao Wu, Wenqi Zhang, Chen Sun, Kuanhong Xu

    Abstract: The utilization of AIoT technology has become a crucial trend in modern poultry management, offering the potential to optimize farming operations and reduce human workloads. This paper presents a real-time and compact edge-AI enabled detector designed to identify chickens and their healthy statuses using frames captured by a lightweight and intelligent camera equipped with an edge-AI enabled CMOS… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  8. arXiv:2407.02277  [pdf, other

    cs.SD eess.AS

    MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing

    Authors: Shangda Wu, Yashan Wang, Xiaobing Li, Feng Yu, Maosong Sun

    Abstract: In the domain of symbolic music research, the progress of developing scalable systems has been notably hindered by the scarcity of available training data and the demand for models tailored to specific tasks. To address these issues, we propose MelodyT5, a novel unified framework that leverages an encoder-decoder architecture tailored for symbolic music processing in ABC notation. This framework c… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 9 pages, 2 figures, 3 tables, accepted by ISMIR 2024

  9. arXiv:2407.00743  [pdf, other

    cs.MM cs.AI cs.CL eess.AS

    AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations

    Authors: Sheng Wu, Jiaxing Liu, Longbiao Wang, Dongxiao He, Xiaobao Wang, Jianwu Dang

    Abstract: Emotion Recognition in Conversations (ERC) is a popular task in natural language processing, which aims to recognize the emotional state of the speaker in conversations. While current research primarily emphasizes contextual modeling, there exists a dearth of investigation into effective multimodal fusion methods. We propose a novel framework called AIMDiT to solve the problem of multimodal fusion… ▽ More

    Submitted 12 April, 2024; originally announced July 2024.

  10. arXiv:2406.09326  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS

    PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

    Authors: Qijun Gan, Song Wang, Shengtao Wu, Jianke Zhu

    Abstract: Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-h… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Codes and Dataset: https://agnjason.github.io/PianoMotion-page

  11. arXiv:2406.07532  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Hearing Anything Anywhere

    Authors: Mason Wang, Ryosuke Sawata, Samuel Clarke, Ruohan Gao, Shangzhe Wu, Jiajun Wu

    Abstract: Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. The first two authors contributed equally. Project page: https://masonlwang.com/hearinganythinganywhere/

    ACM Class: I.2.10; I.4.8

  12. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  13. arXiv:2406.01153  [pdf, other

    eess.SY

    Safety-Critical Control of Euler-Lagrange Systems Subject to Multiple Obstacles and Velocity Constraints

    Authors: Zhi Liu, Si Wu, Tengfei Liu, Zhong-Ping Jiang

    Abstract: This paper studies the safety-critical control problem for Euler-Lagrange (EL) systems subject to multiple ball obstacles and velocity constraints in accordance with affordable velocity ranges. A key strategy is to exploit the underlying inner-outer-loop structure for the design of a new cascade controller for the class of EL systems. In particular, the outer-loop controller is developed based on… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  14. arXiv:2406.01058  [pdf, other

    eess.SY

    Constructive Safety Control

    Authors: Si Wu, Tengfei Liu, Zhong-Ping Jiang

    Abstract: This paper proposes a constructive approach to safety control of nonlinear cascade systems subject to multiple state constraints. New design ingredients include a unified characterization of safety and stability for systematic designs of safety controllers, and a novel technique of reshaping the feasible sets of quadratically constrained quadratic programming induced from safety control. The propo… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  15. arXiv:2405.20073  [pdf, other

    cs.IT eess.SP

    Power Allocation for Cell-Free Massive MIMO ISAC Systems with OTFS Signal

    Authors: Yifei Fan, Shaochuan Wu, Xixi Bi, Guoyu Li

    Abstract: Applying integrated sensing and communication (ISAC) to a cell-free massive multiple-input multiple-output (CF mMIMO) architecture has attracted increasing attention. This approach equips CF mMIMO networks with sensing capabilities and resolves the problem of unreliable service at cell edges in conventional cellular networks. However, existing studies on CF-ISAC systems have focused on the applica… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: This work is submitted to IEEE for possible publication

  16. arXiv:2405.19204  [pdf, other

    eess.IV cs.CV

    Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification

    Authors: Michail Mamalakis, Héloïse de Vareilles, Shun-Chin Jim Wu, Ingrid Agartz, Lynn Egeland Mørch-Johnsen, Jane Garrison, Jon Simons, Pietro Lio, John Suckling, Graham Murray

    Abstract: In the last decade, computer vision has witnessed the establishment of various training and learning approaches. Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard, representing state-of-the-art methods extensively employed for fully training or pre-training networks across various vision tasks. The ex… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  17. arXiv:2405.09936  [pdf

    eess.SY

    Collaborative planning of integrated hydrogen energy chain multi-energy systems: A review

    Authors: Xinning Yia, Tianguang Lua, Jing Li, Shaocong Wu

    Abstract: Most planning of the traditional hydrogen energy supply chain (HSC) focuses on the storage and transportation links between production and consumption ends. It ignores the energy flows and interactions between each link, making it unsuitable for energy system planning analysis. Therefore, we propose the concept of a hydrogen energy chain (HEC) based on the HSC, which emphasizes the interactions be… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  18. arXiv:2404.06393  [pdf, other

    cs.SD cs.AI eess.AS

    MuPT: A Generative Symbolic Music Pretrained Transformer

    Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

    Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  19. arXiv:2404.00362  [pdf, other

    cs.CV eess.IV

    STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario

    Authors: Renyang Liu, Kwok-Yan Lam, Wei Zhou, Sixing Wu, Jun Zhao, Dongting Hu, Mingming Gong

    Abstract: Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  20. arXiv:2403.20142  [pdf, other

    cs.CV eess.IV

    StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation

    Authors: Sidi Wu, Yizi Chen, Samuel Mermet, Lorenz Hurni, Konrad Schindler, Nicolas Gonthier, Loic Landrieu

    Abstract: Most image-to-image translation models postulate that a unique correspondence exists between the semantic classes of the source and target domains. However, this assumption does not always hold in real-world scenarios due to divergent distributions, different class sets, and asymmetrical information representation. As conventional GANs attempt to generate images that match the distribution of the… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  21. arXiv:2403.09651  [pdf, other

    cs.CV eess.IV

    Precision Agriculture: Crop Mapping using Machine Learning and Sentinel-2 Satellite Imagery

    Authors: Kui Zhao, Siyang Wu, Chang Liu, Yue Wu, Natalia Efremova

    Abstract: Food security has grown in significance due to the changing climate and its warming effects. To support the rising demand for agricultural products and to minimize the negative impact of climate change and mass cultivation, precision agriculture has become increasingly important for crop cultivation. This study employs deep learning and pixel-based machine learning methods to accurately segment la… ▽ More

    Submitted 25 November, 2023; originally announced March 2024.

  22. arXiv:2403.07317  [pdf, other

    eess.SY

    GMPC: Geometric Model Predictive Control for Wheeled Mobile Robot Trajectory Tracking

    Authors: Jiawei Tang, Shuang Wu, Bo Lan, Yahui Dong, Yuqiang Jin, Guangjian Tian, Wen-An Zhang, Ling Shi

    Abstract: The configuration of most robotic systems lies in continuous transformation groups. However, in mobile robot trajectory tracking, many recent works still naively utilize optimization methods for elements in vector space without considering the manifold constraint of the robot configuration. In this letter, we propose a geometric model predictive control (MPC) framework for wheeled mobile robot tra… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  23. arXiv:2403.06167  [pdf, other

    eess.SY

    Direct Shooting Method for Numerical Optimal Control: A Modified Transcription Approach

    Authors: Jiawei Tang, Yuxing Zhong, Pengyu Wang, Xingzhou Chen, Shuang Wu, Ling Shi

    Abstract: Direct shooting is an efficient method to solve numerical optimal control. It utilizes the Runge-Kutta scheme to discretize a continuous-time optimal control problem making the problem solvable by nonlinear programming solvers. However, conventional direct shooting raises a contradictory dynamics issue when using an augmented state to handle {high-order} systems. This paper fills the research gap… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by ECC24

  24. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  25. arXiv:2402.08768  [pdf, other

    eess.IV cs.LG

    Adversarially Robust Feature Learning for Breast Cancer Diagnosis

    Authors: Degan Hao, Dooman Arefan, Margarita Zuley, Wendie Berg, Shandong Wu

    Abstract: Adversarial data can lead to malfunction of deep learning applications. It is essential to develop deep learning models that are robust to adversarial data while accurate on standard, clean data. In this study, we proposed a novel adversarially robust feature learning (ARFL) method for a real-world application of breast cancer diagnosis. ARFL facilitates adversarial training using both standard da… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  26. arXiv:2401.12263  [pdf, ps, other

    eess.SY math.PR

    Maintenance policy for a system with a weighted linear combination of degradation processes

    Authors: Shaomin Wu, Inma T. Castro

    Abstract: This paper develops maintenance policies for a system under condition monitoring. We assume that a number of defects may develop and the degradation process of each defect follows a gamma process, respectively. The system is inspected periodically and maintenance actions are performed on the defects present in the system. The effectiveness of the maintenance is assumed imperfect and it is modelled… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  27. arXiv:2401.11675  [pdf, other

    eess.IV

    Rethinking Cross-Attention for Infrared and Visible Image Fusion

    Authors: Lihua Jian, Songlei Xiong, Han Yan, Xiaoguang Niu, Shaowu Wu, Di Zhang

    Abstract: The salient information of an infrared image and the abundant texture of a visible image can be fused to obtain a comprehensive image. As can be known, the current fusion methods based on Transformer techniques for infrared and visible (IV) images have exhibited promising performance. However, the attention mechanism of the previous Transformer-based methods was prone to extract common information… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

  28. arXiv:2312.09442  [pdf

    eess.SP cs.AI cs.LG

    A Compact LSTM-SVM Fusion Model for Long-Duration Cardiovascular Diseases Detection

    Authors: Siyang Wu

    Abstract: Globally, cardiovascular diseases (CVDs) are the leading cause of mortality, accounting for an estimated 17.9 million deaths annually. One critical clinical objective is the early detection of CVDs using electrocardiogram (ECG) data, an area that has received significant attention from the research community. Recent advancements based on machine learning and deep learning have achieved great progr… ▽ More

    Submitted 23 January, 2024; v1 submitted 20 November, 2023; originally announced December 2023.

  29. arXiv:2312.07258  [pdf, other

    cs.CV eess.IV

    SSTA: Salient Spatially Transformed Attack

    Authors: Renyang Liu, Wei Zhou, Sixin Wu, Jun Zhao, Kwok-Yan Lam

    Abstract: Extensive studies have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial attacks, which brings a huge security risk to the further application of DNNs, especially for the AI models developed in the real world. Despite the significant progress that has been made recently, existing attack methods still suffer from the unsatisfactory performance of escaping from being detect… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  30. arXiv:2311.16783  [pdf, ps, other

    eess.SP

    A General 3D Non-Stationary 5G Wireless Channel Model

    Authors: Shangbin Wu, Cheng-Xiang Wang, el-Hadi M. Aggoune, Mohammed M. Alwakeel, Xiao-Hu You

    Abstract: A novel unified framework of geometry-based stochastic models (GBSMs) for the fifth generation (5G) wireless communication systems is proposed in this paper. The proposed general 5G channel model aims at capturing small-scale fading channel characteristics of key 5G communication scenarios, such as massive multiple-input multiple-output (MIMO), high-speed train (HST), vehicle-to-vehicle (V2V), and… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  31. arXiv:2311.13688  [pdf, other

    eess.IV cs.CV cs.LG

    Masked Conditional Diffusion Models for Image Analysis with Application to Radiographic Diagnosis of Infant Abuse

    Authors: Shaoju Wu, Sila Kurugol, Andy Tsai

    Abstract: The classic metaphyseal lesion (CML) is a distinct injury that is highly specific for infant abuse. It commonly occurs in the distal tibia. To aid radiologists detect these subtle fractures, we need to develop a model that can flag abnormal distal tibial radiographs (i.e. those with CMLs). Unfortunately, the development of such a model requires a large and diverse training database, which is often… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted by MICCAI DALI 2023

  32. arXiv:2311.07069  [pdf, other

    cs.SD eess.AS

    Music ControlNet: Multiple Time-varying Controls for Music Generation

    Authors: Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan

    Abstract: Text-to-music generation models are now capable of generating high-quality music audio in broad styles. However, text control is primarily suitable for the manipulation of global musical attributes like genre, mood, and tempo, and is less suitable for precise control over time-varying attributes such as the positions of beats in time or the changing dynamics of the music. We propose Music ControlN… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 11 pages, 4 figure, 5 tables, Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  33. arXiv:2311.04772  [pdf, other

    eess.IV cs.CV

    GCS-ICHNet: Assessment of Intracerebral Hemorrhage Prognosis using Self-Attention with Domain Knowledge Integration

    Authors: Xuhao Shan, Xinyang Li, Ruiquan Ge, Shibin Wu, Ahmed Elazab, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Qingying Xiao, Xiang Wan, Changmiao Wang

    Abstract: Intracerebral Hemorrhage (ICH) is a severe condition resulting from damaged brain blood vessel ruptures, often leading to complications and fatalities. Timely and accurate prognosis and management are essential due to its high mortality rate. However, conventional methods heavily rely on subjective clinician expertise, which can lead to inaccurate diagnoses and delays in treatment. Artificial inte… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 6 pages, 3 figures, 5 tables, published to BIBM 2023

  34. arXiv:2311.01066  [pdf, other

    eess.IV cs.CV

    Dynamic Multimodal Information Bottleneck for Multimodality Classification

    Authors: Yingying Fang, Shuang Wu, Sheng Zhang, Chaoyan Huang, Tieyong Zeng, Xiaodan Xing, Simon Walsh, Guang Yang

    Abstract: Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These appro… ▽ More

    Submitted 25 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  35. arXiv:2310.09078  [pdf, other

    cs.NI eess.SP

    DNFS-VNE: Deep Neuro Fuzzy System Driven Virtual Network Embedding

    Authors: Ailing Xiao, Ning Chen, Sheng Wu, Peiying Zhang, Linling Kuang, Chunxiao Jiang

    Abstract: By decoupling substrate resources, network virtualization (NV) is a promising solution for meeting diverse demands and ensuring differentiated quality of service (QoS). In particular, virtual network embedding (VNE) is a critical enabling technology that enhances the flexibility and scalability of network deployment by addressing the coupling of Internet processes and services. However, in the exi… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  36. arXiv:2310.04722  [pdf, other

    cs.SD cs.AI eess.AS

    A Holistic Evaluation of Piano Sound Quality

    Authors: Monan Zhou, Shangda Wu, Shaohua Ji, Zijin Li, Wei Li

    Abstract: This paper aims to develop a holistic evaluation method for piano sound quality to assist in purchasing decisions. Unlike previous studies that focused on the effect of piano performance techniques on sound quality, this study evaluates the inherent sound quality of different pianos. To derive quality evaluation systems, the study uses subjective questionnaires based on a piano sound quality datas… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  37. arXiv:2309.17352  [pdf, other

    cs.SD eess.AS

    Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation

    Authors: Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, Shinji Watanabe

    Abstract: Automated audio captioning (AAC) aims to generate informative descriptions for various sounds from nature and/or human activities. In recent years, AAC has quickly attracted research interest, with state-of-the-art systems now relying on a sequence-to-sequence (seq2seq) backbone powered by strong models such as Transformers. Following the macro-trend of applied machine learning research, in this w… ▽ More

    Submitted 9 January, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: ICASSP 2024 camera-ready paper. Winner of the DCASE 2023 Challenge Task 6A: Automated Audio Captioning (AAC)

  38. arXiv:2309.13259  [pdf, other

    cs.IR cs.AI cs.SD eess.AS

    WikiMT++ Dataset Card

    Authors: Monan Zhou, Shangda Wu, Yuan Wang, Wei Li

    Abstract: WikiMT++ is an expanded and refined version of WikiMusicText (WikiMT), featuring 1010 curated lead sheets in ABC notation. To expand application scenarios of WikiMT, we add both objective (album, lyrics, video) and subjective emotion (12 emotion adjectives) and emo\_4q (Russell 4Q) attributes, enhancing its usability for music information retrieval, conditional music generation, automatic composit… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  39. arXiv:2309.09180  [pdf, other

    eess.AS cs.AI cs.SD

    Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

    Authors: Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee

    Abstract: We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance. Next, we further decrease the memory occupation of decoding by in… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  40. arXiv:2309.08348  [pdf, other

    eess.AS cs.SD

    The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

    Authors: Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

    Abstract: Previous Multimodal Information based Speech Processing (MISP) challenges mainly focused on audio-visual speech recognition (AVSR) with commendable success. However, the most advanced back-end recognition systems often hit performance limits due to the complex acoustic environments. This has prompted a shift in focus towards the Audio-Visual Target Speaker Extraction (AVTSE) task for the MISP 2023… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures

  41. arXiv:2309.07940  [pdf, other

    eess.IV

    CvFormer: Cross-view transFormers with Pre-training for fMRI Analysis of Human Brain

    Authors: Xiangzhu Meng, Qiang Liu, Shu Wu, Liang Wang

    Abstract: In recent years, functional magnetic resonance imaging (fMRI) has been widely utilized to diagnose neurological disease, by exploiting the region of interest (RoI) nodes as well as their connectivities in human brain. However, most of existing works only rely on either RoIs or connectivities, neglecting the potential for complementary information between them. To address this issue, we study how t… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  42. arXiv:2308.14638  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

    Authors: Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

    Abstract: This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy base… ▽ More

    Submitted 10 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by 2023 CHiME Workshop, Oral

  43. arXiv:2307.08688  [pdf, other

    eess.AS

    Semi-supervised multi-channel speaker diarization with cross-channel attention

    Authors: Shilong Wu, Jun Du, Maokui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee

    Abstract: Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios. This paper proposes a semi-supervised speaker diarization system to utilize large-scale multi-channel training data by generating pseudo-labels for unlabeled data. Furthermore, we introduce cross-channel attention into the Neural Speaker Diarization Using Me… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 8 pages,3 figures

  44. Antenna Impedance Estimation in Correlated Rayleigh Fading Channels

    Authors: Shaohan Wu, Brian Hughes

    Abstract: We formulate antenna impedance estimation in a classical estimation framework under correlated Raleigh fading channels. Based on training sequences of multiple packets, we derive the ML estimators for antenna impedance and channel variance, treating the fading path gains as nuisance parameters. These ML estimators can be found via scalar optimization. We explore the efficiency of these estimators… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, ICASSP 2023. arXiv admin note: substantial text overlap with arXiv:2006.11443

  45. arXiv:2307.02148  [pdf

    eess.IV cs.CV

    Compound Attention and Neighbor Matching Network for Multi-contrast MRI Super-resolution

    Authors: Wenxuan Chen, Sirui Wu, Shuai Wang, Zhongsen Li, Jia Yang, Huifeng Yao, Xiaolei Song

    Abstract: Multi-contrast magnetic resonance imaging (MRI) reflects information about human tissue from different perspectives and has many clinical applications. By utilizing the complementary information among different modalities, multi-contrast super-resolution (SR) of MRI can achieve better results than single-image super-resolution. However, existing methods of multi-contrast MRI SR have the following… ▽ More

    Submitted 16 September, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  46. arXiv:2307.00535  [pdf, other

    cs.IT eess.SP

    Goal-oriented Tensor: Beyond Age of Information Towards Semantics-Empowered Goal-Oriented Communications

    Authors: Aimin Li, Shaohua Wu, Sumei Sun, Jie Cao

    Abstract: Optimizations premised on open-loop metrics such as Age of Information (AoI) indirectly enhance the system's decision-making utility. We therefore propose a novel closed-loop metric named Goal-oriented Tensor (GoT) to directly quantify the impact of semantic mismatches on goal-oriented decision-making utility. Leveraging the GoT, we consider a sampler & decision-maker pair that works collaborative… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: 30 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:2305.04083

  47. arXiv:2306.17790  [pdf, other

    eess.SP physics.atom-ph

    Theoretical Analysis of Heterodyne Rydberg Atomic Receiver Sensitivity Based on Transit Relaxation Effect and Frequency Detuning

    Authors: Shanchi Wu, Chen Gong, Shangbin Li, Rui Ni, Jinkang Zhu

    Abstract: We conduct a theoretical investigation into the impacts of local microwave electric field frequency detuning, laser frequency detuning, and transit relaxation rate on enhancing heterodyne Rydberg atomic receiver sensitivity. To optimize the output signal amplitude given the input microwave signal, we derive the steady-state solutions of the atomic density matrix. Numerical results show that laser… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: 9 pages, 9 figures, 19 references

  48. arXiv:2306.05091  [pdf, other

    stat.ME eess.SP

    Robust Quickest Change Detection for Unnormalized Models

    Authors: Suya Wu, Enmao Diao, Taposh Banerjee, Jie Ding, Vahid Tarokh

    Abstract: Detecting an abrupt and persistent change in the underlying distribution of online data streams is an important problem in many applications. This paper proposes a new robust score-based algorithm called RSCUSUM, which can be applied to unnormalized models and addresses the issue of unknown post-change distributions. RSCUSUM replaces the Kullback-Leibler divergence with the Fisher divergence betwe… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted for the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023). arXiv admin note: text overlap with arXiv:2302.00250

  49. arXiv:2306.01247  [pdf, other

    eess.AS

    Tensor decomposition for minimization of E2E SLU model toward on-device processing

    Authors: Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe

    Abstract: Spoken Language Understanding (SLU) is a critical speech recognition application and is often deployed on edge devices. Consequently, on-device processing plays a significant role in the practical implementation of SLU. This paper focuses on the end-to-end (E2E) SLU model due to its small latency property, unlike a cascade system, and aims to minimize the computational cost. We reduce the model si… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  50. arXiv:2305.19005  [pdf, other

    cs.IT eess.SP

    Hybrid Driven Learning for Channel Estimation in Intelligent Reflecting Surface Aided Millimeter Wave Communications

    Authors: Shuntian Zheng, Sheng Wu, Chunxiao Jiang, Wei Zhang, Xiaojun Jing

    Abstract: Intelligent reflecting surfaces (IRS) have been proposed in millimeter wave (mmWave) and terahertz (THz) systems to achieve both coverage and capacity enhancement, where the design of hybrid precoders, combiners, and the IRS typically relies on channel state information. In this paper, we address the problem of uplink wideband channel estimation for IRS aided multiuser multiple-input single-output… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 30 pages, 8 figures, submitted to IEEE transactions on wireless communications on December 13, 2022