Zum Hauptinhalt springen

Showing 1–50 of 376 results for author: Wu, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.11405  [pdf, other

    cs.SD eess.AS

    DDSP Guitar Amp: Interpretable Guitar Amplifier Modeling

    Authors: Yen-Tung Yeh, Yu-Hua Chen, Yuan-Chiao Cheng, Jui-Te Wu, Jun-Jie Fu, Yi-Fan Yeh, Yi-Hsuan Yang

    Abstract: Neural network models for guitar amplifier emulation, while being effective, often demand high computational cost and lack interpretability. Drawing ideas from physical amplifier design, this paper aims to address these issues with a new differentiable digital signal processing (DDSP)-based model, called ``DDSP guitar amp,'' that models the four components of a guitar amp (i.e., preamp, tone stack… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Preprint paper

  2. arXiv:2408.08121  [pdf

    eess.SY

    Optimizing Highway Ramp Merge Safety and Efficiency via Spatio-Temporal Cooperative Control and Vehicle-Road Coordination

    Authors: Ting Peng, Xiaoxue Xu, Yuan Li, Jie Wu, Tao Li, Xiang Dong, Yincai Cai, Peng Wu

    Abstract: In view of existing automatic driving, it is difficult to accurately and timely obtain the status and driving intention of other vehicles. The safety risk and urgency of autonomous vehicles in the absence of collision are evaluated. To ensure safety and improve road efficiency, a method of pre-compiling the spatio-temporal trajectory of vehicles is established to eliminate conflicts between vehicl… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  3. arXiv:2408.07931  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

    Authors: Haofeng Liu, Erli Zhang, Junde Wu, Mingxuan Hong, Yueming Jin

    Abstract: Surgical video segmentation is a critical task in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has shown superior advancements in image and video segmentation. However, SAM2 struggles with efficiency due to the high computational demands of processing high-resolution images and complex and long-r… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 16 pages, 2 figures

  4. arXiv:2408.07325  [pdf, other

    eess.IV cs.GR

    RoCoSDF: Row-Column Scanned Neural Signed Distance Fields for Freehand 3D Ultrasound Imaging Shape Reconstruction

    Authors: Hongbo Chen, Yuchong Gao, Shuhang Zhang, Jiangjie Wu, Yuexin Ma, Rui Zheng

    Abstract: The reconstruction of high-quality shape geometry is crucial for developing freehand 3D ultrasound imaging. However, the shape reconstruction of multi-view ultrasound data remains challenging due to the elevation distortion caused by thick transducer probes. In this paper, we present a novel learning-based framework RoCoSDF, which can effectively generate an implicit surface through continuous sha… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by MICCAI 2024

  5. arXiv:2408.01620  [pdf, other

    eess.IV cs.CV

    MedUHIP: Towards Human-In-the-Loop Medical Segmentation

    Authors: Jiayuan Zhu, Junde Wu

    Abstract: Although segmenting natural images has shown impressive performance, these techniques cannot be directly applied to medical image segmentation. Medical image segmentation is particularly complicated by inherent uncertainties. For instance, the ambiguous boundaries of tissues can lead to diverse but plausible annotations from different clinicians. These uncertainties cause significant discrepancies… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  6. arXiv:2407.20893  [pdf, other

    cs.LG cs.AI eess.SP

    MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network

    Authors: Yinlong Xu, Xiaoqiang Liu, Zitai Kong, Yixuan Wu, Yue Wang, Yingzhou Lu, Honghao Gao, Jian Wu, Hongxia Xu

    Abstract: Cardiac arrhythmia, a condition characterized by irregular heartbeats, often serves as an early indication of various heart ailments. With the advent of deep learning, numerous innovative models have been introduced for diagnosing arrhythmias using Electrocardiogram (ECG) signals. However, recent studies solely focus on the performance of models, neglecting the interpretation of their results. Thi… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  7. arXiv:2407.20445  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation

    Authors: Junda Wu, Zachary Novack, Amit Namburi, Jiaheng Dai, Hao-Wen Dong, Zhouhang Xie, Carol Chen, Julian McAuley

    Abstract: Existing music captioning methods are limited to generating concise global descriptions of short music clips, which fail to capture fine-grained musical characteristics and time-aware musical changes. To address these limitations, we propose FUTGA, a model equipped with fined-grained music understanding capabilities through learning from generative augmentation with temporal compositions. We lever… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 6 pages

  8. arXiv:2407.19763  [pdf, other

    eess.IV cs.CV

    TeleOR: Real-time Telemedicine System for Full-Scene Operating Room

    Authors: Yixuan Wu, Kaiyuan Hu, Qian Shao, Jintai Chen, Danny Z. Chen, Jian Wu

    Abstract: The advent of telemedicine represents a transformative development in leveraging technology to extend the reach of specialized medical expertise to remote surgeries, a field where the immediacy of expert guidance is paramount. However, the intricate dynamics of Operating Room (OR) scene pose unique challenges for telemedicine, particularly in achieving high-fidelity, real-time scene reconstruction… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  9. arXiv:2407.19316  [pdf

    eess.IV cs.AI cs.CV

    AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images

    Authors: Xin Zhao, Qianqian Zhu, Jialing Wu

    Abstract: To address the challenges of similarity between lesions and surrounding tissues, overlapping appearances of partially benign and malignant nodules, and difficulty in classification, a deep learning network that integrates CNN and Transformer is proposed for the classification of benign and malignant breast lesions in ultrasound images. This network adopts a dual-branch architecture for local-globa… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

  10. arXiv:2407.18613  [pdf

    cs.CV eess.IV

    Dilated Strip Attention Network for Image Restoration

    Authors: Fangwei Hao, Jiesheng Wu, Ji Du, Yinjie Wang, Jing Xu

    Abstract: Image restoration is a long-standing task that seeks to recover the latent sharp image from its deteriorated counterpart. Due to the robust capacity of self-attention to capture long-range dependencies, transformer-based methods or some attention-based convolutional neural networks have demonstrated promising results on many image restoration tasks in recent years. However, existing attention modu… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  11. arXiv:2407.16639  [pdf, other

    cs.SD eess.AS

    Distortion Recovery: A Two-Stage Method for Guitar Effect Removal

    Authors: Ying-Shuo Lee, Yueh-Po Peng, Jui-Te Wu, Ming Cheng, Li Su, Yi-Hsuan Yang

    Abstract: Removing audio effects from electric guitar recordings makes it easier for post-production and sound editing. An audio distortion recovery model not only improves the clarity of the guitar sounds but also opens up new opportunities for creative adjustments in mixing and mastering. While progress have been made in creating such models, previous efforts have largely focused on synthetic distortions… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: DAFx 2024

  12. arXiv:2407.16554  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization

    Authors: Junyan Wu, Wei Lu, Xiangyang Luo, Rui Yang, Qian Wang, Xiaochun Cao

    Abstract: Recently, a novel form of audio partial forgery has posed challenges to its forensics, requiring advanced countermeasures to detect subtle forgery manipulations within long-duration audio. However, existing countermeasures still serve a classification purpose and fail to perform meaningful analysis of the start and end timestamps of partial forgery segments. To address this challenge, we introduce… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 9pages, 3figures. This paper has been accepted for ACM MM 2024

    MSC Class: 68T07; 68T10 ACM Class: I.2; I.5

  13. arXiv:2407.14894  [pdf, other

    eess.SY

    A Holistic Optimization Framework for Energy Efficient UAV-assisted Fog Computing: Attitude Control, Trajectory Planning and Task Assignment

    Authors: Shuaijun Liu, Jinqiu Du, Yaxin Zheng, Jiaying Yin, Yuhui Deng, Jingjin Wu

    Abstract: Unmanned Aerial Vehicles (UAVs) have significantly enhanced fog computing by acting as both flexible computation platforms and communication mobile relays. In this paper, we propose a holistic framework that jointly optimizes the total latency and energy consumption for UAV-assisted fog computing in a three-dimensional spatial domain with varying terrain elevations and dynamic task generations. Ou… ▽ More

    Submitted 5 August, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures

  14. arXiv:2407.12534  [pdf, ps, other

    eess.SP

    RIS-Based Self-Interference Cancellation for Full-Duplex Broadband Transmission

    Authors: Jiayan Wu, Wenchi Cheng, Jianyu Wang, Jingqing Wang, Wei Zhang

    Abstract: Full-duplex (FD) is an attractive technology that can significantly boost the throughput of wireless communications. However, it is limited by the severe self-interference (SI) from the transmitter to the local receiver. In this paper, we propose a new SI cancellation (SIC) scheme based on reconfigurable intelligent surface (RIS), where small RISs are deployed inside FD devices to enhance SIC capa… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  15. arXiv:2407.10646  [pdf, other

    cs.SD eess.AS

    Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control

    Authors: Yu-Hua Chen, Yen-Tung Yeh, Yuan-Chiao Cheng, Jui-Te Wu, Yu-Hsiang Ho, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: Replicating analog device circuits through neural audio effect modeling has garnered increasing interest in recent years. Existing work has predominantly focused on a one-to-one emulation strategy, modeling specific devices individually. In this paper, we tackle the less-explored scenario of one-to-many emulation, utilizing conditioning mechanisms to emulate multiple guitar amplifiers through a si… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ISMIR 2024

  16. arXiv:2407.08664  [pdf, other

    cs.CE eess.SY

    MBD-NODE: Physics-informed data-driven modeling and simulation of constrained multibody systems

    Authors: Jingquan Wang, Shu Wang, Huzaifa Mustafa Unjhawala, Jinlong Wu, Dan Negrut

    Abstract: We describe a framework that can integrate prior physical information, e.g., the presence of kinematic constraints, to support data-driven simulation in multi-body dynamics. Unlike other approaches, e.g., Fully-connected Neural Network (FCNN) or Recurrent Neural Network (RNN)-based methods that are used to model the system states directly, the proposed approach embraces a Neural Ordinary Different… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  17. arXiv:2407.04888  [pdf, other

    eess.IV cs.CV

    Unraveling Radiomics Complexity: Strategies for Optimal Simplicity in Predictive Modeling

    Authors: Mahdi Ait Lhaj Loutfi, Teodora Boblea Podasca, Alex Zwanenburg, Taman Upadhaya, Jorge Barrios, David R. Raleigh, William C. Chen, Dante P. I. Capaldi, Hong Zheng, Olivier Gevaert, Jing Wu, Alvin C. Silva, Paul J. Zhang, Harrison X. Bai, Jan Seuntjens, Steffen Löck, Patrick O. Richard, Olivier Morin, Caroline Reinhold, Martin Lepage, Martin Vallières

    Abstract: Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Mat… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  18. arXiv:2407.03671  [pdf

    eess.SY

    Spatio-temporal cooperative control Method of Highway Ramp Merge Based on Vehicle-road Coordination

    Authors: Xiaoxue Xu, Maokai Lai, Haitao Zhang, Xiang Dong, Tao Li, Jie Wu, Yuan Li, Ting Peng

    Abstract: The merging area of highway ramps faces multiple challenges, including traffic congestion, collision risks, speed mismatches, driver behavior uncertainties, limited visibility, and bottleneck effects. However, autonomous vehicles engaging in depth coordination between vehicle and road in merging zones, by pre-planning and uploading travel trajectories, can significantly enhance the safety and effi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  19. arXiv:2406.18089  [pdf, other

    cs.SD cs.MM eess.AS

    A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons

    Authors: Tzu-Yun Hung, Jui-Te Wu, Yu-Chia Kuo, Yo-Wei Hsiao, Ting-Wei Lin, Li Su

    Abstract: Expressive music synthesis (EMS) for violin performance is a challenging task due to the disagreement among music performers in the interpretation of expressive musical terms (EMTs), scarcity of labeled recordings, and limited generalization ability of the synthesis model. These challenges create trade-offs between model effectiveness, diversity of generated results, and controllability of the syn… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 15 pages, 2 figures, 3 tables

  20. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Tong Chen, Qiaozhi Tan, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, Jinlin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  21. arXiv:2406.13179  [pdf, other

    cs.SD cs.AI cs.NE eess.AS

    Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting

    Authors: Shuai Wang, Dehao Zhang, Kexin Shi, Yuchen Wang, Wenjie Wei, Jibin Wu, Malu Zhang

    Abstract: Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  22. arXiv:2406.10137  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Compressed Sensor Caching and Collaborative Sparse Data Recovery with Anchor Alignment

    Authors: Yi-Jen Yang, Ming-Hsun Yang, Jwo-Yuh Wu, Y. -W. Peter Hong

    Abstract: This work examines the compressed sensor caching problem in wireless sensor networks and devises efficient distributed sparse data recovery algorithms to enable collaboration among multiple caches. In this problem, each cache is only allowed to access measurements from a small subset of sensors within its vicinity to reduce both cache size and data acquisition overhead. To enable reliable data rec… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: v1 was submitted to IEEE Transactions on Signal Processing on Sept. 18, 2023

  23. arXiv:2406.07532  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Hearing Anything Anywhere

    Authors: Mason Wang, Ryosuke Sawata, Samuel Clarke, Ruohan Gao, Shangzhe Wu, Jiajun Wu

    Abstract: Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. The first two authors contributed equally. Project page: https://masonlwang.com/hearinganythinganywhere/

    ACM Class: I.2.10; I.4.8

  24. arXiv:2406.06796  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.SP

    FlexLoc: Conditional Neural Networks for Zero-Shot Sensor Perspective Invariance in Object Localization with Distributed Multimodal Sensors

    Authors: Jason Wu, Ziqi Wang, Xiaomin Ouyang, Ho Lyun Jeong, Colin Samplawski, Lance Kaplan, Benjamin Marlin, Mani Srivastava

    Abstract: Localization is a critical technology for various applications ranging from navigation and surveillance to assisted living. Localization systems typically fuse information from sensors viewing the scene from different perspectives to estimate the target location while also employing multiple modalities for enhanced robustness and accuracy. Recently, such systems have employed end-to-end deep neura… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  25. arXiv:2406.03882  [pdf, other

    cs.CL cs.SD eess.AS

    Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

    Authors: Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

    Abstract: The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse… ▽ More

    Submitted 9 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  26. arXiv:2406.03872  [pdf, other

    cs.CL cs.SD eess.AS

    BLSP-Emo: Towards Empathetic Large Speech-Language Models

    Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, Junhong Wu, Chengqing Zong, Jiajun Zhang

    Abstract: The recent release of GPT-4o showcased the potential of end-to-end multimodal models, not just in terms of low latency but also in their ability to understand and generate expressive speech with rich emotions. While the details are unknown to the open research community, it likely involves significant amounts of curated data and compute, neither of which is readily accessible. In this paper, we pr… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  27. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  28. arXiv:2405.19204  [pdf, other

    eess.IV cs.CV

    Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification

    Authors: Michail Mamalakis, Héloïse de Vareilles, Shun-Chin Jim Wu, Ingrid Agartz, Lynn Egeland Mørch-Johnsen, Jane Garrison, Jon Simons, Pietro Lio, John Suckling, Graham Murray

    Abstract: In the last decade, computer vision has witnessed the establishment of various training and learning approaches. Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard, representing state-of-the-art methods extensively employed for fully training or pre-training networks across various vision tasks. The ex… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  29. arXiv:2405.10948  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

    Authors: Guankun Wang, Long Bai, Wan Jun Nah, Jie Wang, Zhaoxi Zhang, Zhen Chen, Jinlin Wu, Mobarakol Islam, Hongbin Liu, Hongliang Ren

    Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recogni… ▽ More

    Submitted 22 March, 2024; originally announced May 2024.

  30. arXiv:2405.09353  [pdf, other

    eess.IV cs.CV

    Large coordinate kernel attention network for lightweight image super-resolution

    Authors: Fangwei Hao, Jiesheng Wu, Haotian Lu, Ji Du, Jing Xu, Xiaoxuan Xu

    Abstract: The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in comp… ▽ More

    Submitted 30 August, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: 13 pages

  31. arXiv:2405.04167  [pdf, other

    cs.CV eess.IV

    Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

    Authors: Aobo Li, Jinjian Wu, Yongxu Liu, Leida Li

    Abstract: The annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming, especially for authentic images. Training on synthetic data is expected to be beneficial, but synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that introducing more distortion types in the synthetic dataset may… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  32. arXiv:2404.16905  [pdf, other

    cs.CL cs.SD eess.AS

    Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

    Authors: Shen Zhang, Haojie Zhang, Jing Zhang, Xudong Zhang, Yimeng Zhuang, Jinting Wu

    Abstract: In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions. Unraveling the causes of emotions is more challenging. A new task named Multimodal Emotion-Cause Pair Extraction in Conversations is responsible for recognizing emotion and identifying causal expressions. In this study, we propose a multi-stage framework to generate emotion and extract the… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  33. arXiv:2404.16357  [pdf, other

    q-bio.NC eess.SY

    Reverse engineering the brain input: Network control theory to identify cognitive task-related control nodes

    Authors: Zhichao Liang, Yinuo Zhang, Jushen Wu, Quanying Liu

    Abstract: The human brain receives complex inputs when performing cognitive tasks, which range from external inputs via the senses to internal inputs from other brain regions. However, the explicit inputs to the brain during a cognitive task remain unclear. Here, we present an input identification framework for reverse engineering the control nodes and the corresponding inputs to the brain. The framework is… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  34. arXiv:2404.14716  [pdf, other

    cs.CL cs.AI cs.CV cs.SD eess.AS

    Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

    Authors: Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

    Abstract: Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) based on a few examples presented in dialogue history without any model parameter update. Despite such convenience, the performance of ICL heavily depends on the quality of the in-context examples presented, which makes the in-context example selection approach a critical choice. This paper proposes a novel Bayes… ▽ More

    Submitted 16 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 17 pages, 6 figures

  35. arXiv:2404.11171  [pdf, other

    cs.LG cs.AI eess.SP

    Personalized Heart Disease Detection via ECG Digital Twin Generation

    Authors: Yaojun Hu, Jintai Chen, Lianting Hu, Dantong Li, Jiahuan Yan, Haochao Ying, Huiying Liang, Jian Wu

    Abstract: Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ dig… ▽ More

    Submitted 11 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  36. arXiv:2404.10556  [pdf, other

    cs.NI eess.SP

    Generative AI for Advanced UAV Networking

    Authors: Geng Sun, Wenwen Xie, Dusit Niyato, Hongyang Du, Jiawen Kang, Jing Wu, Sumei Sun, Ping Zhang

    Abstract: With the impressive achievements of chatGPT and Sora, generative artificial intelligence (GAI) has received increasing attention. Not limited to the field of content generation, GAI is also widely used to solve the problems in wireless communication scenarios due to its powerful learning and generalization capabilities. Therefore, we discuss key applications of GAI in improving unmanned aerial veh… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  37. arXiv:2404.02999  [pdf, other

    eess.IV cs.CV

    MeshBrush: Painting the Anatomical Mesh with Neural Stylization for Endoscopy

    Authors: John J. Han, Ayberk Acar, Nicholas Kavoussi, Jie Ying Wu

    Abstract: Style transfer is a promising approach to close the sim-to-real gap in medical endoscopy. Rendering realistic endoscopic videos by traversing pre-operative scans (such as MRI or CT) can generate realistic simulations as well as ground truth camera poses and depth maps. Although image-to-image (I2I) translation models such as CycleGAN perform well, they are unsuitable for video-to-video synthesis d… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures

  38. arXiv:2403.18607  [pdf, other

    cs.CR cs.AI eess.SP

    Spikewhisper: Temporal Spike Backdoor Attacks on Federated Neuromorphic Learning over Low-power Devices

    Authors: Hanqing Fu, Gaolei Li, Jun Wu, Jianhua Li, Xi Lin, Kai Zhou, Yuchen Liu

    Abstract: Federated neuromorphic learning (FedNL) leverages event-driven spiking neural networks and federated learning frameworks to effectively execute intelligent analysis tasks over amounts of distributed low-power devices but also perform vulnerability to poisoning attacks. The threat of backdoor attacks on traditional deep neural networks typically comes from time-invariant data. However, in FedNL, un… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  39. arXiv:2403.16476  [pdf

    eess.IV

    A Method for Target Detection Based on Mmw Radar and Vision Fusion

    Authors: Ming Zong, Jiaying Wu, Zhanyu Zhu, Jingen Ni

    Abstract: An efficient and accurate traffic monitoring system often takes advantages of multi-sensor detection to ensure the safety of urban traffic, promoting the accuracy and robustness of target detection and tracking. A method for target detection using Radar-Vision Fusion Path Aggregation Fully Convolutional One-Stage Network (RV-PAFCOS) is proposed in this paper, which is extended from Fully Convoluti… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  40. arXiv:2403.15410  [pdf, other

    eess.SP eess.SY

    Secure and Energy-efficient Unmanned Aerial Vehicle-enabled Visible Light Communication via A Multi-objective Optimization Approach

    Authors: Lingling Liu, Aimin Wang, Jing Wu, Jiao Lu, Jiahui Li, Geng Sun

    Abstract: In this research, a unique approach to provide communication service for terrestrial receivers via using unmanned aerial vehicle-enabled visible light communication is investigated. Specifically, we take into account a unmanned aerial vehicle-enabled visible light communication scenario with multiplex transmitters, multiplex receivers, and a single eavesdropper, each of which is equipped with a si… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 18 pages, 9 tables, 3 tables

  41. arXiv:2403.10931  [pdf, other

    eess.IV cs.CV

    Uncertainty-Aware Adapter: Adapting Segment Anything Model (SAM) for Ambiguous Medical Image Segmentation

    Authors: Mingzhou Jiang, Jiaying Zhou, Junde Wu, Tianyang Wang, Yueming Jin, Min Xu

    Abstract: The Segment Anything Model (SAM) gained significant success in natural image segmentation, and many methods have tried to fine-tune it to medical image segmentation. An efficient way to do so is by using Adapters, specialized modules that learn just a few parameters to tailor SAM specifically for medical images. However, unlike natural images, many tissues and lesions in medical images have blurry… ▽ More

    Submitted 18 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  42. arXiv:2403.08781  [pdf, other

    eess.SY

    Bounded-Time Nonblocking Supervisory Control of Timed Discrete-Event Systems

    Authors: Renyuan Zhang, Jiale Wu, Junhua Gou, Yabo Zhu, Kai Cai

    Abstract: Recently an automaton property of quantitative nonblockingness was proposed in supervisory control of untimed discrete-event systems (DES), which quantifies the standard nonblocking property by capturing the practical requirement that all tasks be completed within a bounded number of steps. However, in practice tasks may be further required to be completed in specific time. To meet this new requir… ▽ More

    Submitted 29 July, 2024; v1 submitted 27 January, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2108.00721

  43. arXiv:2403.04290  [pdf, other

    eess.IV cs.CV cs.LG

    MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

    Authors: Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu

    Abstract: Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications. However, recent works concentrate on separate medical generation models for distinct medical tasks and are restricted to inadequate medical multi-modal knowledge, constraining medical comprehensive diagnosis. In this paper, we propose MedM2G, a Medical… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  44. arXiv:2403.02566  [pdf, other

    eess.IV cs.CV

    Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning

    Authors: Zhaoxin Fan, Runmin Jiang, Junhao Wu, Xin Huang, Tianyang Wang, Heng Huang, Min Xu

    Abstract: 3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning. Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation. However, this approach heavily relies on labor-intensive and time-consuming fully annotated ground-truth labels, particularly for 3D volumes. To overcome this limitation,… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  45. arXiv:2403.01132  [pdf

    cs.LG cs.SD eess.AS

    MPIPN: A Multi Physics-Informed PointNet for solving parametric acoustic-structure systems

    Authors: Chu Wang, Jinhong Wu, Yanzhi Wang, Zhijian Zha, Qi Zhou

    Abstract: Machine learning is employed for solving physical systems governed by general nonlinear partial differential equations (PDEs). However, complex multi-physics systems such as acoustic-structure coupling are often described by a series of PDEs that incorporate variable physical quantities, which are referred to as parametric systems. There are lack of strategies for solving parametric systems govern… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: The number of figures is 16. The number of tables is 5. The number of words is 9717

  46. arXiv:2402.18192  [pdf, other

    cs.CV eess.IV

    Misalignment-Robust Frequency Distribution Loss for Image Transformation

    Authors: Zhangkai Ni, Juncheng Wu, Zian Wang, Wenhan Yang, Hanli Wang, Lin Ma

    Abstract: This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution, which heavily rely on precisely aligned paired datasets with pixel-level alignments. However, creating precisely aligned paired images presents significant challenges and hinders the advancement of methods trained on such data. To overcome this challeng… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted to Computer Vision and Pattern Recognition Conference (CVPR) 2024

  47. arXiv:2402.17502  [pdf, other

    cs.CV eess.IV

    FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation

    Authors: Li Lin, Yixiang Liu, Jiewei Wu, Pujin Cheng, Zhiyuan Cai, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures

  48. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  49. arXiv:2402.14136  [pdf, other

    cs.RO cs.LG eess.SP

    GDTM: An Indoor Geospatial Tracking Dataset with Distributed Multimodal Sensors

    Authors: Ho Lyun Jeong, Ziqi Wang, Colin Samplawski, Jason Wu, Shiwei Fang, Lance M. Kaplan, Deepak Ganesan, Benjamin Marlin, Mani Srivastava

    Abstract: Constantly locating moving objects, i.e., geospatial tracking, is essential for autonomous building infrastructure. Accurate and robust geospatial tracking often leverages multimodal sensor fusion algorithms, which require large datasets with time-aligned, synchronized data from various sensor types. However, such datasets are not readily available. Hence, we propose GDTM, a nine-hour dataset for… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  50. arXiv:2402.02292  [pdf

    physics.comp-ph eess.SP

    A 3-D Full-Wave Model to Study the Impact of Soybean Components and Structure on L-Band Backscatter

    Authors: Kaiser Niknam, Jasmeet Judge, A. Kaleo Roberts, Alejandro Monsivais-Huertero, Robert Moore, Kamal Sarabandi, Jiayi Wu

    Abstract: Microwave remote sensing offers a powerful tool for monitoring the growth of short, dense vegetation like soybean. As the plants mature, changes in their biomass and 3-D structure impact the electromagnetic (EM) backscatter signal. This backscatter information holds valuable insights into crop health and yield, prompting the need for a comprehensive understanding of how structural and biophysical… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.