Zum Hauptinhalt springen

Showing 1–50 of 487 results for author: Liu, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.17449  [pdf, other

    eess.SP

    Performance Analysis of Pair-wise Symbol Detection in Uplink NOMA-ISaC Systems

    Authors: Haofeng Liu, Emad Alsusa, Arafat Al-Dweik

    Abstract: This paper investigates the bit error rate (BER) and outage probability performance of integrated sensing and communication (ISaC) in uplink non-orthogonal multiple access (NOMA) based Internet of Things (IoT) systems. Specifically, we consider an ISaC system where the radar signal is designed to be orthogonal to the communication signal over two symbol periods so that its interference on the comm… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.15632  [pdf, other

    eess.SY cs.AI

    Structural Optimization of Lightweight Bipedal Robot via SERL

    Authors: Yi Cheng, Chenxi Han, Yuheng Min, Linqi Ye, Houde Liu, Hang Liu

    Abstract: Designing a bipedal robot is a complex and challenging task, especially when dealing with a multitude of structural parameters. Traditional design methods often rely on human intuition and experience. However, such approaches are time-consuming, labor-intensive, lack theoretical guidance and hard to obtain optimal design results within vast design spaces, thus failing to full exploit the inherent… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  3. arXiv:2408.14585  [pdf, other

    cs.CV cs.SD eess.AS

    Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

    Authors: Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Audio-Visual Speaker Tracking with Incomplete Modalities

  4. arXiv:2408.09357  [pdf, other

    cs.GR cs.AI cs.SD eess.AS

    Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation

    Authors: Xukun Zhou, Fengxin Li, Ziqiao Peng, Kejian Wu, Jun He, Biao Qin, Zhaoxin Fan, Hongyan Liu

    Abstract: Audio-driven 3D face animation is increasingly vital in live streaming and augmented reality applications. While remarkable progress has been observed, most existing approaches are designed for specific individuals with predefined speaking styles, thus neglecting the adaptability to varied speaking styles. To address this limitation, this paper introduces MetaFace, a novel methodology meticulously… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  5. arXiv:2408.07931  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

    Authors: Haofeng Liu, Erli Zhang, Junde Wu, Mingxuan Hong, Yueming Jin

    Abstract: Surgical video segmentation is a critical task in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has shown superior advancements in image and video segmentation. However, SAM2 struggles with efficiency due to the high computational demands of processing high-resolution images and complex and long-r… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 16 pages, 2 figures

  6. Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

    Authors: Yiming Li, Zhifang Guo, Xiangdong Wang, Hong Liu

    Abstract: Recent advances have been witnessed in audio-language joint learning, such as CLAP, that shows much success in multi-modal understanding tasks. These models usually aggregate uni-modal local representations, namely frame or word features, into global ones, on which the contrastive loss is employed to reach coarse-grained cross-modal alignment. However, frame-level correspondence with texts may be… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: ACM MM 2024 (Oral)

  7. arXiv:2408.06790  [pdf, other

    eess.SY

    Residual Deep Reinforcement Learning for Inverter-based Volt-Var Control

    Authors: Qiong Liu, Ye Guo, Lirong Deng, Haotian Liu, Dongyu Li, Hongbin Sun

    Abstract: A residual deep reinforcement learning (RDRL) approach is proposed by integrating DRL with model-based optimization for inverter-based volt-var control in active distribution networks when the accurate power flow model is unknown. RDRL learns a residual action with a reduced residual action space, based on the action of the model-based approach with an approximate model. RDRL inherits the control… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2210.07360

  8. arXiv:2408.05777  [pdf, other

    cs.CV cs.AI eess.IV

    Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

    Authors: Hannuo Zhang, Huihui Li, Jiarui Lin, Yujie Zhang, Jianghua Fan, Hang Liu

    Abstract: Optical remote sensing and Synthetic Aperture Radar(SAR) remote sensing are crucial for earth observation, offering complementary capabilities. While optical sensors provide high-quality images, they are limited by weather and lighting conditions. In contrast, SAR sensors can operate effectively under adverse conditions. This letter proposes a GAN-based SAR-to-optical image translation method name… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  9. arXiv:2408.04912  [pdf, other

    cs.SD cs.CE cs.ET cs.LG eess.AS

    AcousAF: Acoustic Sensing-Based Atrial Fibrillation Detection System for Mobile Phones

    Authors: Xuanyu Liu, Haoxian Liu, Jiao Li, Zongqi Yang, Yi Huang, Jin Zhang

    Abstract: Atrial fibrillation (AF) is characterized by irregular electrical impulses originating in the atria, which can lead to severe complications and even death. Due to the intermittent nature of the AF, early and timely monitoring of AF is critical for patients to prevent further exacerbation of the condition. Although ambulatory ECG Holter monitors provide accurate monitoring, the high cost of these d… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted for publication in Companion of the 2024 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp Companion '24)

  10. arXiv:2408.04837  [pdf, ps, other

    cs.IT eess.SP

    Multi-User MISO with Stacked Intelligent Metasurfaces: A DRL-Based Sum-Rate Optimization Approach

    Authors: Hao Liu, Jiancheng An, George C. Alexandropoulos, Derrick Wing Kwan Ng, Chau Yuen, Lu Gan

    Abstract: Stacked intelligent metasurfaces (SIMs) represent a novel signal processing paradigm that enables over-the-air processing of electromagnetic waves at the speed of light. Their multi-layer architecture exhibits customizable computational capabilities compared to conventional single-layer reconfigurable intelligent surfaces and metasurface lenses. In this paper, we deploy SIM to improve the performa… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 34 pages, 11 figures, 2 tables. arXiv admin note: text overlap with arXiv:2402.09006

  11. arXiv:2408.04777  [pdf

    eess.IV cs.CV

    Deep Learning-based Unsupervised Domain Adaptation via a Unified Model for Prostate Lesion Detection Using Multisite Bi-parametric MRI Datasets

    Authors: Hao Li, Han Liu, Heinrich von Busch, Robert Grimm, Henkjan Huisman, Angela Tong, David Winkel, Tobias Penzkofer, Ivan Shabunin, Moon Hyung Choi, Qingsong Yang, Dieter Szolar, Steven Shea, Fergus Coakley, Mukesh Harisinghani, Ipek Oguz, Dorin Comaniciu, Ali Kamen, Bin Lou

    Abstract: Our hypothesis is that UDA using diffusion-weighted images, generated with a unified model, offers a promising and reliable strategy for enhancing the performance of supervised learning models in multi-site prostate lesion detection, especially when various b-values are present. This retrospective study included data from 5,150 patients (14,191 samples) collected across nine different imaging cent… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accept at Radiology: Artificial Intelligence. Journal reference and external DOI will be added once published

  12. arXiv:2408.02586  [pdf, other

    cs.IT eess.SP

    Massive MIMO-OTFS-Based Random Access for Cooperative LEO Satellite Constellations

    Authors: Boxiao Shen, Yongpeng Wu, Shiqi Gong, Heng Liu, Björn Ottersten, Wenjun Zhang

    Abstract: This paper investigates joint device identification, channel estimation, and symbol detection for cooperative multi-satellite-enhanced random access, where orthogonal time-frequency space modulation with the large antenna array is utilized to combat the dynamics of the terrestrial-satellite links (TSLs). We introduce the generalized complex exponential basis expansion model to parameterize TSLs, t… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by IEEE Journal on Selected Areas in Communications

  13. arXiv:2407.21394  [pdf, other

    eess.IV cs.CV

    Force Sensing Guided Artery-Vein Segmentation via Sequential Ultrasound Images

    Authors: Yimeng Geng, Gaofeng Meng, Mingcong Chen, Guanglin Cao, Mingyang Zhao, Jianbo Zhao, Hongbin Liu

    Abstract: Accurate identification of arteries and veins in ultrasound images is crucial for vascular examinations and interventions in robotics-assisted surgeries. However, current methods for ultrasound vessel segmentation face challenges in distinguishing between arteries and veins due to their morphological similarities. To address this challenge, this study introduces a novel force sensing guided segmen… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  14. arXiv:2407.20622  [pdf, other

    cs.CL cs.SD eess.AS

    Decoding Linguistic Representations of Human Brain

    Authors: Yu Wang, Heyang Liu, Yuhao Wang, Chuan Xuan, Yixuan Hou, Sheng Feng, Hongcheng Liu, Yusheng Liao, Yanfeng Wang

    Abstract: Language, as an information medium created by advanced organisms, has always been a concern of neuroscience regarding how it is represented in the brain. Decoding linguistic representations in the evoked brain has shown groundbreaking achievements, thanks to the rapid improvement of neuroimaging, medical technology, life sciences and artificial intelligence. In this work, we present a taxonomy of… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  15. arXiv:2407.17944  [pdf, other

    cs.RO eess.SY

    Time-Optimal Planning for Long-Range Quadrotor Flights: An Automatic Optimal Synthesis Approach

    Authors: Chao Qin, Jingxiang Chen, Yifan Lin, Abhishek Goudar, Angela P. Schoellig, Hugh H. -T. Liu

    Abstract: Time-critical tasks such as drone racing typically cover large operation areas. However, it is difficult and computationally intensive for current time-optimal motion planners to accommodate long flight distances since a large yet unknown number of knot points is required to represent the trajectory. We present a polynomial-based automatic optimal synthesis (AOS) approach that can address this cha… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 19 pages, 19 figures

  16. arXiv:2407.14329  [pdf, other

    cs.SD eess.AS

    Efficient Audio Captioning with Encoder-Level Knowledge Distillation

    Authors: Xuenan Xu, Haohe Liu, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Significant improvement has been achieved in automated audio captioning (AAC) with recent models. However, these models have become increasingly large as their performance is enhanced. In this work, we propose a knowledge distillation (KD) framework for AAC. Our analysis shows that in the encoder-decoder based AAC models, it is more effective to distill knowledge into the encoder as compared with… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  17. arXiv:2407.13220  [pdf, other

    eess.AS cs.SD

    MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

    Authors: Huadai Liu, Jialei Wang, Rongjie Huang, Yang Liu, Jiayang Xu, Zhou Zhao

    Abstract: Text-guided diffusion models catalyze a paradigm shift in audio generation, facilitating the adaptability of source audio to conform to specific textual prompts. Recent advancements introduce inversion techniques, like DDIM inversion, to zero-shot editing, exploiting pre-trained diffusion models for audio modification. Nonetheless, our investigation exposes that DDIM inversion suffers from an accu… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  18. arXiv:2407.03566  [pdf, ps, other

    cs.IT eess.SP

    Stacked Intelligent Metasurfaces for Wireless Sensing and Communication: Applications and Challenges

    Authors: Hao Liu, Jiancheng An, Xing Jia, Shining Lin, Xianghao Yao, Lu Gan, Bruno Clerckx, Chau Yuen, Mehdi Bennis, Mérouane Debbah

    Abstract: The rapid advancement of wireless communication technologies has precipitated an unprecedented demand for high data rates, extremely low latency, and ubiquitous connectivity. In order to achieve these goals, stacked intelligent metasurfaces (SIM) has been developed as a novel solution to perform advanced signal processing tasks directly in the electromagnetic wave domain, thus achieving ultra-fast… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures, 1 table

  19. arXiv:2407.03374  [pdf

    cs.AI cs.SE eess.SP eess.SY

    An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

    Authors: Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, Yiling Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

    Abstract: Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  20. arXiv:2407.03188  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation

    Authors: Zihao Wang, Haoxuan Liu, Jiaxing Yu, Tao Zhang, Yan Liu, Kejun Zhang

    Abstract: Amid the rising intersection of generative AI and human artistic processes, this study probes the critical yet less-explored terrain of alignment in human-centric automatic song composition. We propose a novel task of Colloquial Description-to-Song Generation, which focuses on aligning the generated content with colloquial human expressions. This task is aimed at bridging the gap between colloquia… ▽ More

    Submitted 10 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 figures

    MSC Class: 68Txx(Primary)14F05; 91Fxx(Secondary) ACM Class: I.2.7; J.5

  21. arXiv:2407.01421  [pdf

    eess.SY

    C-MP: A decentralized adaptive-coordinated traffic signal control using the Max Pressure framework

    Authors: Tanveer Ahmed, Hao Liu, Vikash V. Gayah

    Abstract: Coordinated traffic signals seek to provide uninterrupted flow through a series of closely spaced intersections, typically using pre-defined fixed signal timings and offsets. Adaptive traffic signals dynamically change signal timings based on observed traffic conditions in a way that might disrupt coordinated movements, particularly when these decisions are made independently at each intersection.… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Submitted to Transportation Research Part C: Emerging Technologies

  22. arXiv:2407.00579  [pdf, ps, other

    cs.IT eess.SP

    Active-RIS-Aided Covert Communications in NOMA-Inspired ISAC Wireless Systems

    Authors: Miaomiao Zhu, Pengxu Chen, Liang Yang, Alexandros-Apostolos A. Boulogeorgos, Theodoros A. Tsiftsis, Hongwu Liu

    Abstract: Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim o… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  23. arXiv:2406.19305  [pdf, other

    eess.SY

    A Max Pressure Algorithm for Traffic Signals Considering Pedestrian Queues

    Authors: Hao Liu, Vikash V. Gayah, Michael Levin

    Abstract: This paper proposes a novel max-pressure (MP) algorithm that incorporates pedestrian traffic into the MP control architecture. Pedestrians are modeled as being included in one of two groups: those walking on sidewalks and those queued at intersections waiting to cross. Traffic dynamics models for both groups are developed. Under the proposed control policy, the signal timings are determined based… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  24. OCC-MP: A Max-Pressure framework to prioritize transit and high occupancy vehicles

    Authors: Tanveer Ahmed, Hao Liu, Vikash V. Gayah

    Abstract: Max-pressure (MP) is a decentralized adaptive traffic signal control approach that has been shown to maximize throughput for private vehicles. However, MP-based signal control algorithms do not differentiate the movement of transit vehicles from private vehicles or between high and single-occupancy private vehicles. Prioritizing the movement of transit or other high occupancy vehicles (HOVs) is vi… ▽ More

    Submitted 4 August, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: This paper will be published in Transportation Research Part C: Emerging Technologies

    Journal ref: Transportation Research Part C: Emerging Technologies 166 (2024): 104795

  25. arXiv:2406.17800  [pdf, other

    q-bio.QM cs.SD eess.AS

    Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

    Authors: Meng Cui, Xubo Liu, Haohe Liu, Jinzheng Zhao, Daoliang Li, Wenwu Wang

    Abstract: Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which are essential for optimizing production efficiency, enhancing fish welfare, and improving resource management. Previous reviews have focused on single… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  26. arXiv:2406.16058  [pdf, other

    eess.AS

    Text-Queried Target Sound Event Localization

    Authors: Jinzheng Zhao, Xinyuan Qian, Yong Xu, Haohe Liu, Yin Cao, Davide Berghi, Wenwu Wang

    Abstract: Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target sound event localization (SEL), a new paradigm that allows the user to input the… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by EUSIPCO 2024

  27. arXiv:2406.15716  [pdf, other

    eess.IV cs.CV

    Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge

    Authors: Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz

    Abstract: Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the flu… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  28. arXiv:2406.14977  [pdf, other

    cs.AI eess.IV

    Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data

    Authors: Shan Cong, Zhoujie Fan, Hongwei Liu, Yinghan Zhang, Xin Wang, Haoran Luo, Xiaohui Yao

    Abstract: Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  29. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Tong Chen, Qiaozhi Tan, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, Jinlin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  30. arXiv:2406.12596   

    eess.SP

    Beyond Near-Field: Far-Field Location Division Multiple Access in Downlink MIMO Systems

    Authors: Haoyan Liu, Caijian Jie, Min Yang, Chengguang Li

    Abstract: Exploring channel dimensions has been the driving force behind breakthroughs in successive generations of mobile communication systems. In 5G, space division multiple access (SDMA) leveraging massive MIMO has been crucial in enhancing system capacity through spatial differentiation of users. However, SDMA can only finely distinguish users at adjacent angles in ultra-dense networks by extremely lar… ▽ More

    Submitted 5 August, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: We have omitted an important detail of the baseband equivalent model, which may mislead the reader. We are currently trying to resolve this issue, please withdraw our submission

  31. arXiv:2406.12254  [pdf, other

    eess.IV cs.CV

    Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

    Authors: Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael E. Kim, Rendong Zhang, Shunxing Bao, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta… ▽ More

    Submitted 12 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  32. arXiv:2406.11568  [pdf, other

    cs.CL cs.SD eess.AS q-bio.NC

    Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

    Authors: Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang

    Abstract: In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis. Our methodology leverages the comprehensive reasoning abilities of large language models (LLMs) to facilitate direct decoding. By fully integrating LLMs, we achieve results comparable to the state-of-the-art cascade m… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.11336  [pdf, other

    eess.SY

    LFPLM: A General and Flexible Load Forecasting Framework based on Pre-trained Language Model

    Authors: Mingyang Gao, Suyang Zhou, Wei Gu, Zhi Wu, Zijian Hu, Hong Zhu, Haiquan Liu

    Abstract: Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures and 5 tables

  34. arXiv:2406.08837  [pdf

    eess.IV cs.CV cs.LG

    Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network

    Authors: Houze Liu, Iris Li, Yaxin Liang, Dan Sun, Yining Yang, Haowei Yang

    Abstract: Neural networks with relatively shallow layers and simple structures may have limited ability in accurately identifying pneumonia. In addition, deep neural networks also have a large demand for computing resources, which may cause convolutional neural networks to be unable to be implemented on terminals. Therefore, this paper will carry out the optimal classification of convolutional neural networ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  35. arXiv:2406.06979  [pdf, other

    cs.LG cs.CR cs.SD eess.AS

    AudioMarkBench: Benchmarking Robustness of Audio Watermarking

    Authors: Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong

    Abstract: The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present A… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  36. Multi-Objective Sizing Optimization Method of Microgrid Considering Cost and Carbon Emissions

    Authors: Xiang Zhu, Guangchun Ruan, Hua Geng, Honghai Liu, Mingfei Bai, Chao Peng

    Abstract: Microgrid serves as a promising solution to integrate and manage distributed renewable energy resources. In this paper, we establish a stochastic multi-objective sizing optimization (SMOSO) model for microgrid planning, which fully captures the battery degradation characteristics and the total carbon emissions. The microgrid operator aims to simultaneously maximize the economic benefits and minimi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industry Applications

  37. arXiv:2406.06295  [pdf, other

    cs.SD eess.AS

    Zero-Shot Audio Captioning Using Soft and Hard Prompts

    Authors: Yiming Zhang, Xuenan Xu, Ruoyi Du, Haohe Liu, Yuan Dong, Zheng-Hua Tan, Wenwu Wang, Zhanyu Ma

    Abstract: In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these model… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  38. arXiv:2406.02918  [pdf, other

    eess.IV cs.CV

    U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation

    Authors: Chenxin Li, Xinyu Liu, Wuyang Li, Cheng Wang, Hengyu Liu, Yifan Liu, Zhen Chen, Yixuan Yuan

    Abstract: U-Net has become a cornerstone in various visual applications such as image segmentation and diffusion probability models. While numerous innovative designs and improvements have been introduced by incorporating transformers or MLPs, the networks are still limited to linearly modeling patterns as well as the deficient interpretability. To address these challenges, our intuition is inspired by the… ▽ More

    Submitted 22 August, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  39. arXiv:2406.00356  [pdf, other

    eess.AS cs.SD

    AudioLCM: Text-to-Audio Generation with Latent Consistency Models

    Authors: Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao

    Abstract: Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the forefront of various generative tasks. However, their iterative sampling process poses a significant computational burden, resulting in slow generation speeds and limiting their application in text-to-audio generation deployment. In this work, we introduce AudioLCM, a novel consistency-based model tailored for efficie… ▽ More

    Submitted 9 July, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  40. arXiv:2405.16258  [pdf, other

    cs.LG cs.AI eess.SY

    USD: Unsupervised Soft Contrastive Learning for Fault Detection in Multivariate Time Series

    Authors: Hong Liu, Xiuxiu Qiu, Yiming Shi, Zelin Zang

    Abstract: Unsupervised fault detection in multivariate time series is critical for maintaining the integrity and efficiency of complex systems, with current methodologies largely focusing on statistical and machine learning techniques. However, these approaches often rest on the assumption that data distributions conform to Gaussian models, overlooking the diversity of patterns that can manifest in both nor… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 19 pages, 7 figures, under review

  41. arXiv:2405.12996  [pdf, other

    eess.IV

    Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data

    Authors: Huidong Xie, Weijie Gan, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Benjamin A. Spencer, Reimund Bayerlein, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang, Ramsey D. Badawi, Chi Liu

    Abstract: As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 16 Pages, 15 Figures, 4 Tables. Paper under review. arXiv admin note: substantial text overlap with arXiv:2311.04248

  42. arXiv:2405.12609  [pdf, other

    eess.AS cs.SD

    Mamba in Speech: Towards an Alternative to Self-Attention

    Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

    Abstract: Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and comp… ▽ More

    Submitted 30 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  43. Evaluation of Connected Vehicle Identification-Aware Mixed Traffic Freeway Cooperative Merging

    Authors: Haoji Liu, Fatemeh Jahedinia, Zeyu Mu, B. Brian Park

    Abstract: Cooperative on-ramp merging control for connected automated vehicles (CAVs) has been extensively investigated. However, they did neglect the connected vehicle identification process, which is a must for CAV cooperations. In this paper, we introduced a connected vehicle identification system (VIS) into the on-ramp merging control process for the first time and proposed an evaluation framework to as… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: @2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: 2024 IEEE Intelligent Vehicles Symposium (IV)

  44. arXiv:2405.12357  [pdf

    eess.IV cs.CV

    Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI

    Authors: Di Xu, Xin Miao, Hengjie Liu, Jessica E. Scholey, Wensha Yang, Mary Feng, Michael Ohliger, Hui Lin, Yi Lao, Yang Yang, Ke Sheng

    Abstract: Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI rec… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  45. arXiv:2405.10948  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

    Authors: Guankun Wang, Long Bai, Wan Jun Nah, Jie Wang, Zhaoxi Zhang, Zhen Chen, Jinlin Wu, Mobarakol Islam, Hongbin Liu, Hongliang Ren

    Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recogni… ▽ More

    Submitted 22 March, 2024; originally announced May 2024.

  46. arXiv:2405.10606  [pdf, other

    eess.SP

    Carrier Aggregation Enabled MIMO-OFDM Integrated Sensing and Communication

    Authors: Haotian Liu, Zhiqing Wei, Jinghui Piao, Huici Wu, Xingwang Li, Zhiyong Feng

    Abstract: In the evolution towards the forthcoming era of sixth-generation (6G) mobile communication systems characterized by ubiquitous intelligence, integrated sensing and communication (ISAC) is in a phase of burgeoning development. However, the capabilities of communication and sensing within single frequency band fall short of meeting the escalating demands. To this end, this paper introduces a carrier… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 13page, 9figures, Submitted to IEEE Transactions on Wireless Communications

  47. arXiv:2405.09179  [pdf, other

    eess.SP

    Integrated Sensing and Communication Enabled Cooperative Passive Sensing Using Mobile Communication System

    Authors: Zhiqing Wei, Haotian Liu, Hujun Li, Wangjun Jiang, Zhiyong Feng, Huici Wu, Ping Zhang

    Abstract: Integrated sensing and communication (ISAC) is a potential technology of the sixth-generation (6G) mobile communication system, which enables communication base station (BS) with sensing capability. However, the performance of single-BS sensing is limited, which can be overcome by multi-BS cooperative sensing. There are three types of multi-BS cooperative sensing, including cooperative active sens… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 figures, Submitted to IEEE Transactions on Mobile Computing

  48. arXiv:2405.02873  [pdf, other

    eess.SP

    Target Localization with Macro and Micro Base Stations Cooperative Sensing

    Authors: Haotian Liu, Zhiqing Wei, Furong Yang, Huici Wu, Kaifeng Han, Zhiyong Feng

    Abstract: Addressing the communication and sensing demands of sixth-generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered traction in academia and industry. With the sensing limitation of single base station (BS), multi-BS cooperative sensing is regarded as a promising solution. The coexistence and overlapped coverage of macro BS (MBS) and micro BS (MiBS) are… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 7 pages 6 figures, submitted to 2024 IEEE GLOBECOM

  49. arXiv:2405.02788  [pdf, other

    eess.SP

    Antenna Failure Resilience: Deep Learning-Enabled Robust DOA Estimation with Single Snapshot Sparse Arrays

    Authors: Ruxin Zheng, Shunqiao Sun, Hongshan Liu, Honglei Chen, Mojtaba Soltanalian, Jian Li

    Abstract: Recent advancements in Deep Learning (DL) for Direction of Arrival (DOA) estimation have highlighted its superiority over traditional methods, offering faster inference, enhanced super-resolution, and robust performance in low Signal-to-Noise Ratio (SNR) environments. Despite these advancements, existing research predominantly focuses on multi-snapshot scenarios, a limitation in the context of aut… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Invited paper for IEEE Asilomar conference 2024

  50. arXiv:2405.01104  [pdf, other

    cs.IT eess.SP

    Multi-user ISAC through Stacked Intelligent Metasurfaces: New Algorithms and Experiments

    Authors: Ziqing Wang, Hongzheng Liu, Jianan Zhang, Rujing Xiong, Kai Wan, Xuewen Qian, Marco Di Renzo, Robert Caiming Qiu

    Abstract: This paper investigates a Stacked Intelligent Metasurfaces (SIM)-assisted Integrated Sensing and Communications (ISAC) system. An extended target model is considered, where the BS aims to estimate the complete target response matrix relative to the SIM. Under the constraints of minimum Signal-to-Interference-plus-Noise Ratio (SINR) for the communication users (CUs) and maximum transmit power, we j… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.