Zum Hauptinhalt springen

Showing 1–50 of 209 results for author: Wu, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.16725  [pdf, other

    cs.AI cs.CL cs.HC cs.LG cs.SD eess.AS

    Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

    Authors: Zhifei Xie, Changqiao Wu

    Abstract: Recent advances in language models have achieved significant progress. GPT-4o, as a new milestone, has enabled real-time conversations with humans, demonstrating near-human natural fluency. Such human-computer interaction necessitates models with the capability to perform reasoning directly with the audio modality and generate output in streaming. However, this remains beyond the reach of current… ▽ More

    Submitted 29 August, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

    Comments: Technical report, work in progress. Demo and code: https://github.com/gpt-omni/mini-omni

  2. arXiv:2408.16030  [pdf

    cs.SD cs.AI cs.LG eess.AS

    A Deep Learning Approach to Localizing Multi-level Airway Collapse Based on Snoring Sounds

    Authors: Ying-Chieh Hsu, Stanley Yung-Chuan Liu, Chao-Jung Huang, Chi-Wei Wu, Ren-Kai Cheng, Jane Yung-Jen Hsu, Shang-Ran Huang, Yuan-Ren Cheng, Fu-Shun Hsu

    Abstract: This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The da… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  3. arXiv:2408.13290  [pdf, ps, other

    eess.IV cs.CV

    Multi-modal Intermediate Feature Interaction AutoEncoder for Overall Survival Prediction of Esophageal Squamous Cell Cancer

    Authors: Chengyu Wu, Yatao Zhang, Yaqi Wang, Qifeng Wang, Shuai Wang

    Abstract: Survival prediction for esophageal squamous cell cancer (ESCC) is crucial for doctors to assess a patient's condition and tailor treatment plans. The application and development of multi-modal deep learning in this field have attracted attention in recent years. However, the prognostically relevant features between cross-modalities have not been further explored in previous studies, which could hi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by ISBI 2024

  4. arXiv:2408.07897  [pdf, other

    cs.LG cs.IR cs.MA eess.SY

    The Nah Bandit: Modeling User Non-compliance in Recommendation Systems

    Authors: Tianyue Zhou, Jung-Hoon Cho, Cathy Wu

    Abstract: Recommendation systems now pervade the digital world, ranging from advertising to entertainment. However, it remains challenging to implement effective recommendation systems in the physical world, such as in mobility or health. This work focuses on a key challenge: in the physical world, it is often easy for the user to opt out of taking any recommendation if they are not to her liking, and to fa… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures, under review

  5. Deep Inertia $L_p$ Half-Quadratic Splitting Unrolling Network for Sparse View CT Reconstruction

    Authors: Yu Guo, Caiying Wu, Yaxin Li, Qiyu Jin, Tieyong Zeng

    Abstract: Sparse view computed tomography (CT) reconstruction poses a challenging ill-posed inverse problem, necessitating effective regularization techniques. In this letter, we employ $L_p$-norm ($0<p<1$) regularization to induce sparsity and introduce inertial steps, leading to the development of the inertial $L_p$-norm half-quadratic splitting algorithm. We rigorously prove the convergence of this algor… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: This paper was accepted by IEEE Signal Processing Letters on July 28, 2024

    Journal ref: IEEE Signal Processing Letters, 2024, 31:2030-2034

  6. arXiv:2408.05609  [pdf, other

    eess.SY cs.AI cs.LG cs.MA cs.RO

    Mitigating Metropolitan Carbon Emissions with Dynamic Eco-driving at Scale

    Authors: Vindula Jayawardana, Baptiste Freydt, Ao Qu, Cameron Hickert, Edgar Sanchez, Catherine Tang, Mark Taylor, Blaine Leonard, Cathy Wu

    Abstract: The sheer scale and diversity of transportation make it a formidable sector to decarbonize. Here, we consider an emerging opportunity to reduce carbon emissions: the growing adoption of semi-autonomous vehicles, which can be programmed to mitigate stop-and-go traffic through intelligent speed commands and, thus, reduce emissions. But would such dynamic eco-driving move the needle on climate change… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: In review

  7. arXiv:2408.03588  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation

    Authors: Karn N. Watcharasupat, Chih-Wei Wu, Iroro Orife

    Abstract: Cinematic audio source separation (CASS), as a standalone problem of extracting individual stems from their mixture, is a fairly new subtask of audio source separation. A typical setup of CASS is a three-stem problem, with the aim of separating the mixture into the dialogue (DX), music (MX), and effects (FX) stems. Given the creative nature of cinematic sound production, however, several edge case… ▽ More

    Submitted 25 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Submitted to the Late-Breaking Demo Session of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024

  8. arXiv:2407.16684  [pdf, other

    eess.IV cs.CV q-bio.NC

    AutoRG-Brain: Grounded Report Generation for Brain MRI

    Authors: Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya Zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

    Abstract: Radiologists are tasked with interpreting a large number of images in a daily base, with the responsibility of generating corresponding reports. This demanding workload elevates the risk of human error, potentially leading to treatment delays, increased healthcare costs, revenue loss, and operational inefficiencies. To address these challenges, we initiate a series of work on grounded Automatic Re… ▽ More

    Submitted 29 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  9. arXiv:2407.07275  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support

    Authors: Karn N. Watcharasupat, Chih-Wei Wu, Iroro Orife

    Abstract: Cinematic audio source separation (CASS), as a problem of extracting the dialogue, music, and effects stems from their mixture, is a relatively new subtask of audio source separation. To date, only one publicly available dataset exists for CASS, that is, the Divide and Remaster (DnR) dataset, which is currently at version 2. While DnR v2 has been an incredibly useful resource for CASS, several are… ▽ More

    Submitted 25 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to the 5th IEEE International Symposium on the Internet of Sounds. Camera-ready version

  10. arXiv:2407.06227  [pdf, ps, other

    eess.SY cs.AI

    Communication and Control Co-Design in 6G: Sequential Decision-Making with LLMs

    Authors: Xianfu Chen, Celimuge Wu, Yi Shen, Yusheng Ji, Tsutomu Yoshinaga, Qiang Ni, Charilaos C. Zarakovitis, Honggang Zhang

    Abstract: This article investigates a control system within the context of six-generation wireless networks. The control performance optimization confronts the technical challenges that arise from the intricate interactions between communication and control sub-systems, asking for a co-design. Accounting for the system dynamics, we formulate the sequential co-design decision-makings of communication and con… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  11. arXiv:2407.02052  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for The ICMC-ASR Challenge

    Authors: Minghui Wu, Luzhen Xu, Jie Zhang, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang

    Abstract: This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position,… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ICASSP 2024

  12. arXiv:2407.00129  [pdf

    eess.IV cs.AI cs.HC

    Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction

    Authors: Akash Awasthi, Ngan Le, Zhigang Deng, Rishi Agrawal, Carol C. Wu, Hien Van Nguyen

    Abstract: Predicting human gaze behavior within computer vision is integral for developing interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Submitted to the Journal

  13. arXiv:2406.19686  [pdf

    eess.IV cs.AI cs.CV cs.HC

    Enhancing Radiological Diagnosis: A Collaborative Approach Integrating AI and Human Expertise for Visual Miss Correction

    Authors: Akash Awasthi, Ngan Le, Zhigang Deng, Carol C. Wu, Hien Van Nguyen

    Abstract: Human-AI collaboration to identify and correct perceptual errors in chest radiographs has not been previously explored. This study aimed to develop a collaborative AI system, CoRaX, which integrates eye gaze data and radiology reports to enhance diagnostic accuracy in chest radiology by pinpointing perceptual errors and refining the decision-making process. Using public datasets REFLACX and EGD-CX… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Under Review in Journal

  14. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  15. arXiv:2406.10873  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies

    Authors: Chung-Wen Wu, Berlin Chen

    Abstract: Automatic Speech Assessment (ASA) has seen notable advancements with the utilization of self-supervised features (SSL) in recent research. However, a key challenge in ASA lies in the imbalanced distribution of data, particularly evident in English test datasets. To address this challenge, we approach ASA as an ordinal classification task, introducing Weighted Vectors Ranking Similarity (W-RankSim)… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  16. arXiv:2406.09569  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time

    Authors: Frank Seide, Morrie Doulaty, Yangyang Shi, Yashesh Gaur, Junteng Jia, Chunyang Wu

    Abstract: We introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous audio without explicit end-pointing. Speech ReaLLM is a special case of the more general ReaLLM ("real-time LLM") approach, also introduced here for the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  17. arXiv:2406.04262  [pdf, other

    eess.SP

    Near-field Beam Training with Sparse DFT Codebook

    Authors: Cong Zhou, Chenyu Wu, Changsheng You, Shuo Shi

    Abstract: Extremely large-scale array (XL-array) has emerged as one promising technology to improve the spectral efficiency and spatial resolution of future sixth generation (6G) wireless systems.The upsurge in the antenna number antennas renders communication users more likely to be located in the near-field region, which requires a more accurate spherical (instead of planar) wavefront propagation modeling… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: In this paper, we propose a novel sparse DFT codebook to reduce near-field beam training overhead, which is equivalent to sparsely activating the dense array

  18. arXiv:2405.19356  [pdf, other

    eess.SP cs.AI cs.LG cs.RO

    An LSTM Feature Imitation Network for Hand Movement Recognition from sEMG Signals

    Authors: Chuheng Wu, S. Farokh Atashzar, Mohammad M. Ghassemi, Tuka Alhanai

    Abstract: Surface Electromyography (sEMG) is a non-invasive signal that is used in the recognition of hand movement patterns, the diagnosis of diseases, and the robust control of prostheses. Despite the remarkable success of recent end-to-end Deep Learning approaches, they are still limited by the need for large amounts of labeled data. To alleviate the requirement for big data, researchers utilize Feature… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to RA-L, and under review

  19. arXiv:2405.18526  [pdf, other

    eess.SY physics.soc-ph

    Unlocking the Potential of Renewable Energy Through Curtailment Prediction

    Authors: Bilge Acun, Brent Morgan, Henry Richardson, Nat Steinsultz, Carole-Jean Wu

    Abstract: A significant fraction (5-15%) of renewable energy generated goes into waste in the grids around the world today due to oversupply issues and transmission constraints. Being able to predict when and where renewable curtailment occurs would improve renewable utilization. The core of this work is to enable the machine learning community to help decarbonize electricity grids by unlocking the potentia… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: The work was presented as a part of the Climate Change AI workshop at NeurIPS 2023

  20. arXiv:2405.17100  [pdf, other

    cs.CR cs.SD eess.AS

    Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

    Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, Jing Chen, Kun He, Ruiying Du

    Abstract: The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  21. arXiv:2405.11541  [pdf, other

    cs.IT eess.SP

    R-NeRF: Neural Radiance Fields for Modeling RIS-enabled Wireless Environments

    Authors: Huiying Yang, Zihan Jin, Chenhao Wu, Rujing Xiong, Robert Caiming Qiu, Zenan Ling

    Abstract: Recently, ray tracing has gained renewed interest with the advent of Reflective Intelligent Surfaces (RIS) technology, a key enabler of 6G wireless communications due to its capability of intelligent manipulation of electromagnetic waves. However, accurately modeling RIS-enabled wireless environments poses significant challenges due to the complex variations caused by various environmental factors… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  22. arXiv:2405.11155  [pdf, other

    eess.SY cs.CC

    Inner-approximate Reachability Computation via Zonotopic Boundary Analysis

    Authors: Dejin Ren, Zhen Liang, Chenyu Wu, Jianqiang Ding, Taoran Wu, Bai Xue

    Abstract: Inner-approximate reachability analysis involves calculating subsets of reachable sets, known as inner-approximations. This analysis is crucial in the fields of dynamic systems analysis and control theory as it provides a reliable estimation of the set of states that a system can reach from given initial states at a specific time instant. In this paper, we study the inner-approximate reachability… ▽ More

    Submitted 21 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: the extended version of the paper accepted by CAV 2024

  23. arXiv:2405.09539  [pdf, ps, other

    eess.IV cs.CV cs.MM

    MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer

    Authors: Chengyu Wu, Chengkai Wang, Yaqi Wang, Huiyu Zhou, Yatao Zhang, Qifeng Wang, Shuai Wang

    Abstract: Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Early accepted to MICCAI 2024 (6/6/5)

  24. arXiv:2405.07717  [pdf, other

    eess.IV

    On the Adversarial Robustness of Learning-based Image Compression Against Rate-Distortion Attacks

    Authors: Chenhao Wu, Qingbo Wu, Haoran Wei, Shuai Chen, Lei Wang, King Ngi Ngan, Fanman Meng, Hongliang Li

    Abstract: Despite demonstrating superior rate-distortion (RD) performance, learning-based image compression (LIC) algorithms have been found to be vulnerable to malicious perturbations in recent studies. However, the adversarial attacks considered in existing literature remain divergent from real-world scenarios, both in terms of the attack direction and bitrate. Additionally, existing methods focus solely… ▽ More

    Submitted 4 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  25. arXiv:2405.03141  [pdf, other

    eess.IV cs.AI cs.CV physics.med-ph

    Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation

    Authors: Yihao Zhou, Timothy Tin-Yan Lee, Kelly Ka-Lee Lai, Chonglin Wu, Hin Ting Lau, De Yang, Chui-Yi Chan, Winnie Chiu-Wing Chu, Jack Chun-Yiu Cheng, Tsz-Ping Lam, Yong-Ping Zheng

    Abstract: The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  26. arXiv:2404.15854  [pdf, other

    cs.CR cs.LG cs.SD eess.AS

    CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning

    Authors: Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu

    Abstract: The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE TDSC

  27. arXiv:2404.11959  [pdf, other

    eess.SY

    Segmented Model-Based Hydrogen Delivery Control for PEM Fuel Cells: a Port-Hamiltonian Approach

    Authors: Lalitesh Kumar, Jian Chen, Chengshuai Wu, Yuzhu Chen, Arjan van der Schaft

    Abstract: This paper proposes an extended interconnection and damping assignment passivity-based control technique (IDA-PBC) to control the pressure dynamics in the fuel delivery subsystem (FDS) of proton exchange membrane fuel cells. The fuel cell stack is a distributed parameter model which can be modeled by partial differential equations PDEs). In this paper, the segmentation concept is used to approxima… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 12 pages, 11 Figures

  28. Change Guiding Network: Incorporating Change Prior to Guide Change Detection in Remote Sensing Imagery

    Authors: Chengxi Han, Chen Wu, Haonan Guo, Meiqi Hu, Jiepan Li, Hongruixuan Chen

    Abstract: The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficien… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  29. arXiv:2404.09140  [pdf, other

    cs.LG cs.IT eess.SP

    RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion

    Authors: Guoxuan Chi, Zheng Yang, Chenshu Wu, Jingao Xu, Yuchong Gao, Yunhao Liu, Tony Xiao Han

    Abstract: Along with AIGC shines in CV and NLP, its potential in the wireless domain has also emerged in recent years. Yet, existing RF-oriented generative solutions are ill-suited for generating high-quality, time-series RF data due to limited representation capabilities. In this work, inspired by the stellar achievements of the diffusion model in CV and NLP, we adapt it to the RF domain and propose RF-Dif… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Accepted by MobiCom 2024

    ACM Class: I.2.0

  30. arXiv:2404.01716  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Effective internal language model training and fusion for factorized transducer model

    Authors: Jinxi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: The internal language model (ILM) of the neural transducer has been widely studied. In most prior work, it is mainly used for estimating the ILM score and is subsequently subtracted during inference to facilitate improved integration with external language models. Recently, various of factorized transducer models have been proposed, which explicitly embrace a standalone internal language model for… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to ICASSP 2024

  31. arXiv:2403.15528  [pdf, other

    eess.IV cs.AI cs.CV

    Evaluating GPT-4 with Vision on Detection of Radiological Findings on Chest Radiographs

    Authors: Yiliang Zhou, Hanley Ong, Patrick Kennedy, Carol Wu, Jacob Kazam, Keith Hentel, Adam Flanders, George Shih, Yifan Peng

    Abstract: The study examines the application of GPT-4V, a multi-modal large language model equipped with visual recognition, in detecting radiological findings from a set of 100 chest radiographs and suggests that GPT-4V is currently not ready for real-world diagnostic usage in interpreting chest radiographs.

    Submitted 12 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  32. Multi-modal Heart Failure Risk Estimation based on Short ECG and Sampled Long-Term HRV

    Authors: Sergio González, Abel Ko-Chun Yi, Wan-Ting Hsieh, Wei-Chao Chen, Chun-Li Wang, Victor Chien-Chia Wu, Shang-Hung Chang

    Abstract: Cardiovascular diseases, including Heart Failure (HF), remain a leading global cause of mortality, often evading early detection. In this context, accessible and effective risk assessment is indispensable. Traditional approaches rely on resource-intensive diagnostic tests, typically administered after the onset of symptoms. The widespread availability of electrocardiogram (ECG) technology and the… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Journal ref: S. González, A. K.-C. Yi, W.-T. Hsieh, W.-C. Chen, C.-L. Wang, V. C.-C. Wu, S.-H. Chang, Multi-modal heart failure risk estimation based on short ECG and sampled long-term HRV, Information Fusion 107 (2024) 102337

  33. arXiv:2403.04232  [pdf, other

    cs.RO cs.AI cs.LG cs.MA eess.SY

    Generalizing Cooperative Eco-driving via Multi-residual Task Learning

    Authors: Vindula Jayawardana, Sirui Li, Cathy Wu, Yashar Farid, Kentaro Oguchi

    Abstract: Conventional control, such as model-based control, is commonly utilized in autonomous driving due to its efficiency and reliability. However, real-world autonomous driving contends with a multitude of diverse traffic scenarios that are challenging for these planning algorithms. Model-free Deep Reinforcement Learning (DRL) presents a promising avenue in this direction, but learning DRL control poli… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted for publication at ICRA 2024

  34. arXiv:2402.17167  [pdf, ps, other

    eess.SY

    Converse Barrier Certificates for Finite-time Safety Verification of Continuous-time Perturbed Deterministic Systems

    Authors: Yonghan Li, Chenyu Wu, Taoran Wu, Shijie Wang, Bai Xue

    Abstract: In this paper, we investigate the problem of verifying the finite-time safety of continuous-time perturbed deterministic systems represented by ordinary differential equations in the presence of measurable disturbances. Given a finite time horizon, if the system is safe, it, starting from a compact initial set, will remain within an open and bounded safe region throughout the specified time horizo… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  35. arXiv:2402.09097  [pdf, other

    cs.RO cs.AI eess.SY

    A Digital Twin prototype for traffic sign recognition of a learning-enabled autonomous vehicle

    Authors: Mohamed AbdElSalam, Loai Ali, Saddek Bensalem, Weicheng He, Panagiotis Katsaros, Nikolaos Kekatos, Doron Peled, Anastasios Temperekidis, Changshun Wu

    Abstract: In this paper, we present a novel digital twin prototype for a learning-enabled self-driving vehicle. The primary objective of this digital twin is to perform traffic sign recognition and lane keeping. The digital twin architecture relies on co-simulation and uses the Functional Mock-up Interface and SystemC Transaction Level Modeling standards. The digital twin consists of four clients, i) a vehi… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  36. arXiv:2402.04584  [pdf, other

    eess.IV cs.CV

    Troublemaker Learning for Low-Light Image Enhancement

    Authors: Yinghao Song, Zhiyuan Cao, Wanhong Xiang, Sifan Long, Bo Yang, Hongwei Ge, Yanchun Liang, Chunguo Wu

    Abstract: Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for… ▽ More

    Submitted 2 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  37. arXiv:2401.08913  [pdf, other

    cs.CV eess.IV

    Efficient Image Super-Resolution via Symmetric Visual Attention Network

    Authors: Chengxu Wu, Qinrui Fan, Shu Hu, Xi Wu, Xin Wang, Jing Hu

    Abstract: An important development direction in the Single-Image Super-Resolution (SISR) algorithms is to improve the efficiency of the algorithms. Recently, efficient Super-Resolution (SR) research focuses on reducing model complexity and improving efficiency through improved deep small kernel convolution, leading to a small receptive field. The large receptive field obtained by large kernel convolution ca… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 13 pages,4 figures

  38. arXiv:2401.04650  [pdf, other

    cs.RO eess.SY

    Hold 'em and Fold 'em: Towards Human-scale, Feedback-Controlled Soft Origami Robots

    Authors: Immanuel Ampomah Mensah, Jessica Healey, Celina Wu, Andrea Lacunza, Nathaniel Hanson, Kristen L. Dorsey

    Abstract: An underdeveloped capability in soft robotics is proprioceptive feedback control, where soft actuators can be sensed and controlled using only sensors on the robot's body. Additionally, soft actuators are often unable to support human-scale loads due to the extremely compliant materials in use. Developing both feedback control and the ability to actuate under large loads (e.g. 500 N) are key capac… ▽ More

    Submitted 18 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  39. arXiv:2401.03150  [pdf, other

    eess.IV

    O-PRESS: Boosting OCT axial resolution with Prior guidance, Recurrence, and Equivariant Self-Supervision

    Authors: Kaiyan Li, Jingyuan Yang, Wenxuan Liang, Xingde Li, Chenxi Zhang, Lulu Chen, Chan Wu, Xiao Zhang, Zhiyan Xu, Yuelin Wang, Lihui Meng, Yue Zhang, Youxin Chen, S. Kevin Zhou

    Abstract: Optical coherence tomography (OCT) is a noninvasive technology that enables real-time imaging of tissue microanatomies. The axial resolution of OCT is intrinsically constrained by the spectral bandwidth of the employed light source while maintaining a fixed center wavelength for a specific application. Physically extending this bandwidth faces strong limitations and requires a substantial cost. We… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  40. arXiv:2401.00197  [pdf, other

    eess.AS

    ODAQ: Open Dataset of Audio Quality

    Authors: Matteo Torcoli, Chih-Wei Wu, Sascha Dick, Phillip A. Williams, Mhd Modar Halimeh, William Wolcott, Emanuel A. P. Habets

    Abstract: Research into the prediction and analysis of perceived audio quality is hampered by the scarcity of openly available datasets of audio signals accompanied by corresponding subjective quality scores. To address this problem, we present the Open Dataset of Audio Quality (ODAQ), a new dataset containing the results of a MUSHRA listening test conducted with expert listeners from 2 international labora… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024

  41. arXiv:2312.17183  [pdf, other

    eess.IV cs.CV

    One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

    Authors: Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this study, we aim to build up a model that can Segment Anything in radiology scans, driven by Text prompts, termed as SAT. Our main contributions are three folds: (i) for dataset construction, we construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then we build up the largest and most comprehensive segmentation dataset for training, by coll… ▽ More

    Submitted 11 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: 59 pages

  42. arXiv:2312.10343  [pdf, other

    eess.SP cs.AR cs.LG cs.NE

    In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar

    Authors: Yang Sui, Minning Zhu, Lingyi Huang, Chung-Tse Michael Wu, Bo Yuan

    Abstract: Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank dec… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  43. arXiv:2312.09436  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy

    Authors: Jung-Hoon Cho, Sirui Li, Jeongyun Kim, Cathy Wu

    Abstract: The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic to maximize vehicle speed and throughput. This paper explores advisory autonomy, in which real-time driving advisories are issued to the human drivers, thus achieving near-term performance of automated vehicles. Due to the complexity of traffic systems, recent stu… ▽ More

    Submitted 1 August, 2024; v1 submitted 27 November, 2023; originally announced December 2023.

    Comments: 18 pages, 12 figures

  44. arXiv:2312.09429  [pdf

    eess.SP cs.LG

    Deep Learning-Enabled Swallowing Monitoring and Postoperative Recovery Biosensing System

    Authors: Chih-Ning Tsai, Pei-Wen Yang, Tzu-Yen Huang, Jung-Chih Chen, Hsin-Yi Tseng, Che-Wei Wu, Amrit Sarmah, Tzu-En Lin

    Abstract: This study introduces an innovative 3D printed dry electrode tailored for biosensing in postoperative recovery scenarios. Fabricated through a drop coating process, the electrode incorporates a novel 2D material.

    Submitted 24 November, 2023; originally announced December 2023.

    Comments: the abstract can't uploaded fully

    MSC Class: NA ACM Class: A.0

  45. arXiv:2312.08343  [pdf

    eess.IV cs.CV q-bio.QM

    Enhancing CT Image synthesis from multi-modal MRI data based on a multi-task neural network framework

    Authors: Zhuoyao Xin, Christopher Wu, Dong Liu, Chunming Gu, Jia Guo, Jun Hua

    Abstract: Image segmentation, real-value prediction, and cross-modal translation are critical challenges in medical imaging. In this study, we propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture, capable of simultaneously, selectively, and adaptively addressing these medical image tasks. Validation is performed on a public repository of human brain MR… ▽ More

    Submitted 17 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 4 pages, 3 figures, 2 tables

  46. arXiv:2311.12581  [pdf, other

    eess.IV cs.CV

    A Region of Interest Focused Triple UNet Architecture for Skin Lesion Segmentation

    Authors: Guoqing Liu, Yu Guo, Caiying Wu, Guoqing Chen, Barintag Saheya, Qiyu Jin

    Abstract: Skin lesion segmentation is of great significance for skin lesion analysis and subsequent treatment. It is still a challenging task due to the irregular and fuzzy lesion borders, and diversity of skin lesions. In this paper, we propose Triple-UNet to automatically segment skin lesions. It is an organic combination of three UNet architectures with suitable modules. In order to concatenate the first… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 15 pages, 5 figures

    Journal ref: International Journal of Imaging Systems and Technology, 2024, 34(3):e23090

  47. arXiv:2311.09850  [pdf, other

    cs.IT eess.SP

    Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation

    Authors: Tianyu Liu, Changsheng You, Zeyang Hu, Chenyu Wu, Yi Gong, Kaibin Huang

    Abstract: Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challen… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 6 pages, 4 figures, accepted for IEEE Global Communication Conference (GLOBECOM) 2023 Workshop

  48. arXiv:2311.03682  [pdf, ps, other

    eess.SY cs.SI math.OC

    Incentive Design for Eco-driving in Urban Transportation Networks

    Authors: M. Umar B. Niazi, Jung-Hoon Cho, Munther A. Dahleh, Roy Dong, Cathy Wu

    Abstract: Eco-driving emerges as a cost-effective and efficient strategy to mitigate greenhouse gas emissions in urban transportation networks. Acknowledging the persuasive influence of incentives in shaping driver behavior, this paper presents the `eco-planner,' a digital platform devised to promote eco-driving practices in urban transportation. At the outset of their trips, users provide the platform with… ▽ More

    Submitted 16 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  49. arXiv:2310.08960  [pdf, other

    eess.SP

    A unified framework for STAR-RIS coefficients optimization

    Authors: Hancheng Zhu, Yuanwei Liu, Yik Chung Wu, Vincent K. N. Lau

    Abstract: Simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS), which serves users located on both sides of the surface, has recently emerged as a promising enhancement to the traditional reflective only RIS. Due to the lack of a unified comparison of communication systems equipped with different modes of STAR-RIS and the performance degradation caused by the constraint… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  50. arXiv:2310.07689  [pdf, other

    eess.SY

    Hybrid System Stability Analysis of Multi-Lane Mixed-Autonomy Traffic

    Authors: Sirui Li, Roy Dong, Cathy Wu

    Abstract: Autonomous vehicles (AVs) hold vast potential to enhance transportation systems by reducing congestion, improving safety, and lowering emissions. AV controls lead to emergent traffic phenomena; one such intriguing phenomenon is traffic breaks (rolling roadblocks), where a single AV efficiently stabilizes multiple lanes through frequent lane switching, similar to the highway patrolling officers wea… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.