Zum Hauptinhalt springen

Showing 1–50 of 171 results for author: Zhao, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.11885  [pdf, other

    physics.med-ph eess.IV physics.optics

    HDN:Hybrid Deep-learning and Non-line-of-sight Reconstruction Framework for Photoacoustic Brain Imaging

    Authors: Pengcheng Wan, Fan Zhang, Yuting Shen, Xin Shang, Hulin Zhao, Shuangli Liu, Xiaohua Feng, Fei Gao

    Abstract: Photoacoustic imaging (PAI) combines the high contrast of optical imaging with the deep penetration depth of ultrasonic imaging, showing great potential in cerebrovascular disease detection. However, the ultrasonic wave suffers strong attenuation and multi-scattering when it passes through the skull tissue, resulting in the distortion of the collected photoacoustic (PA) signal. In this paper, insp… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 8 pages, 8figures

  2. arXiv:2408.03847  [pdf, other

    eess.SY

    GAIA -- A Large Language Model for Advanced Power Dispatch

    Authors: Yuheng Cheng, Huan Zhao, Xiyuan Zhou, Junhua Zhao, Yuji Cao, Chao Yang

    Abstract: Power dispatch is essential for providing stable, cost-effective, and eco-friendly electricity to society. However, traditional methods falter as power systems grow in scale and complexity, struggling with multitasking, swift problem-solving, and human-machine collaboration. This paper introduces GAIA, the pioneering Large Language Model (LLM) tailored for power dispatch tasks. We have developed a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  3. arXiv:2407.15903  [pdf, other

    eess.IV

    Semantics Guided Disentangled GAN for Chest X-ray Image Rib Segmentation

    Authors: Lili Huang, Dexin Ma, Xiaowei Zhao, Chenglong Li, Haifeng Zhao, Jin Tang, Chuanfu Li

    Abstract: The label annotations for chest X-ray image rib segmentation are time consuming and laborious, and the labeling quality heavily relies on medical knowledge of annotators. To reduce the dependency on annotated data, existing works often utilize generative adversarial network (GAN) to generate training data. However, GAN-based methods overlook the nuanced information specific to individual organs, w… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  4. arXiv:2407.00808  [pdf

    eess.SY cs.AI

    Exploring a Physics-Informed Decision Transformer for Distribution System Restoration: Methodology and Performance Analysis

    Authors: Hong Zhao, Jin Wei-Kocsis, Adel Heidari Akhijahani, Karen L Butler-Purry

    Abstract: Driven by advancements in sensing and computing, deep reinforcement learning (DRL)-based methods have demonstrated significant potential in effectively tackling distribution system restoration (DSR) challenges under uncertain operational scenarios. However, the data-intensive nature of DRL poses obstacles in achieving satisfactory DSR solutions for large-scale, complex distribution systems. Inspir… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  5. arXiv:2406.17286  [pdf

    cs.RO eess.SY

    Prioritized experience replay-based DDQN for Unmanned Vehicle Path Planning

    Authors: Liu Lipeng, Letian Xu, Jiabei Liu, Haopeng Zhao, Tongzhou Jiang, Tianyao Zheng

    Abstract: Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it wi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 4 pages, 6 figures, 2024 5th International Conference on Information Science, Parallel and Distributed Systems

  6. arXiv:2406.16878  [pdf, ps, other

    eess.SP cs.AI cs.IT

    Benchmarking Semantic Communications for Image Transmission Over MIMO Interference Channels

    Authors: Yanhu Wang, Shuaishuai Guo, Anming Dong, Hui Zhao

    Abstract: Semantic communications offer promising prospects for enhancing data transmission efficiency. However, existing schemes have predominantly concentrated on point-to-point transmissions. In this paper, we aim to investigate the validity of this claim in interference scenarios compared to baseline approaches. Specifically, our focus is on general multiple-input multiple-output (MIMO) interference cha… ▽ More

    Submitted 10 April, 2024; originally announced June 2024.

  7. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  8. arXiv:2406.10895  [pdf, ps, other

    eess.SP

    Fair Computation Offloading for RSMA-Assisted Mobile Edge Computing Networks

    Authors: Ding Xu, Lingjie Duan, Haitao Zhao, Hongbo Zhu

    Abstract: Rate splitting multiple access (RSMA) provides a flexible transmission framework that can be applied in mobile edge computing (MEC) systems. However, the research work on RSMA-assisted MEC systems is still at the infancy and many design issues remain unsolved, such as the MEC server and channel allocation problem in general multi-server and multi-channel scenarios as well as the user fairness issu… ▽ More

    Submitted 1 August, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  9. arXiv:2406.05763  [pdf, other

    eess.AS

    WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

    Authors: Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie

    Abstract: With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the open-sourced WenetSpeech dataset. Tailored for the text-to-speech tasks, we refined WenetSpeech by adjusting segment boundaries, enhancing the audio… ▽ More

    Submitted 19 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  10. arXiv:2406.05475  [pdf, other

    cs.CV cs.GR eess.IV

    HDRT: Infrared Capture for HDR Imaging

    Authors: Jingchao Peng, Thomas Bashford-Rogers, Francesco Banterle, Haitao Zhao, Kurt Debattista

    Abstract: Capturing real world lighting is a long standing challenge in imaging and most practical methods acquire High Dynamic Range (HDR) images by either fusing multiple exposures, or boosting the dynamic range of Standard Dynamic Range (SDR) images. Multiple exposure capture is problematic as it requires longer capture times which can often lead to ghosting problems. The main alternative, inverse tone m… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  11. arXiv:2405.12589  [pdf

    eess.SP eess.SY

    An Improved Robust Total Logistic Distance Metric algorithm for Generalized Gaussian Noise and Noisy Input

    Authors: Haiquan Zhao, Yi Peng, Zian Cao

    Abstract: Although the known maximum total generalized correntropy (MTGC) and generalized maximum blakezisserman total correntropy (GMBZTC) algorithms can maintain good performance under the errors-in-variables (EIV) model disrupted by generalized Gaussian noise, their requirement for manual ad-justment of parameters is excessive, greatly increasing the practical difficulty of use. To solve this problem, th… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 10 page

    MSC Class: 94 ACM Class: C.2; F.2; H.4

  12. arXiv:2405.10705  [pdf, other

    eess.IV cs.CV

    3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning

    Authors: Zhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wenping Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming Cui

    Abstract: Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substanti… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 12 pages, 13 figures, 5 tables

  13. arXiv:2405.03956  [pdf, other

    cs.SD eess.AS

    Adaptive Speech Emotion Representation Learning Based On Dynamic Graph

    Authors: Yingxue Gao, Huan Zhao, Zixing Zhang

    Abstract: Graph representation learning has become a hot research topic due to its powerful nonlinear fitting capability in extracting representative node embeddings. However, for sequential data such as speech signals, most traditional methods merely focus on the static graph created within a sequence, and largely overlook the intrinsic evolving patterns of these data. This may reduce the efficiency of gra… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: published at ICASSP 2024

  14. arXiv:2404.19108  [pdf, other

    cs.CV astro-ph.IM eess.IV

    Real-Time Convolutional Neural Network-Based Star Detection and Centroiding Method for CubeSat Star Tracker

    Authors: Hongrui Zhao, Michael F. Lembeck, Adrian Zhuang, Riya Shah, Jesse Wei

    Abstract: Star trackers are one of the most accurate celestial sensors used for absolute attitude determination. The devices detect stars in captured images and accurately compute their projected centroids on an imaging focal plane with subpixel precision. Traditional algorithms for star detection and centroiding often rely on threshold adjustments for star pixel detection and pixel brightness weighting for… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  15. arXiv:2404.07577  [pdf, other

    cs.LG eess.SP

    Generating Comprehensive Lithium Battery Charging Data with Generative AI

    Authors: Lidang Jiang, Changyan Hu, Sibei Ji, Hang Zhao, Junxiong Chen, Ge He

    Abstract: In optimizing performance and extending the lifespan of lithium batteries, accurate state prediction is pivotal. Traditional regression and classification methods have achieved some success in battery state prediction. However, the efficacy of these data-driven approaches heavily relies on the availability and quality of public datasets. Additionally, generating electrochemical data predominantly… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  16. arXiv:2403.12028  [pdf, other

    cs.CV cs.AI eess.IV

    Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

    Authors: Mingjin Chen, Junhao Chen, Xiaojun Ye, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao

    Abstract: 3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the re… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project Page: https://air-discover.github.io/Ultraman/

  17. arXiv:2403.11689  [pdf, other

    eess.IV cs.CV

    MoreStyle: Relax Low-frequency Constraint of Fourier-based Image Reconstruction in Generalizable Medical Image Segmentation

    Authors: Haoyu Zhao, Wenhui Dong, Rui Yu, Zhou Zhao, Du Bo, Yongchao Xu

    Abstract: The task of single-source domain generalization (SDG) in medical image segmentation is crucial due to frequent domain shifts in clinical image datasets. To address the challenge of poor generalization across different domains, we introduce a Plug-and-Play module for data augmentation called MoreStyle. MoreStyle diversifies image styles by relaxing low-frequency constraints in Fourier space, guidin… ▽ More

    Submitted 1 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: MICCAI2024

  18. arXiv:2403.11672  [pdf, other

    eess.IV cs.CV

    WIA-LD2ND: Wavelet-based Image Alignment for Self-supervised Low-Dose CT Denoising

    Authors: Haoyu Zhao, Yuliang Gu, Zhou Zhao, Bo Du, Yongchao Xu, Rui Yu

    Abstract: In clinical examinations and diagnoses, low-dose computed tomography (LDCT) is crucial for minimizing health risks compared with normal-dose computed tomography (NDCT). However, reducing the radiation dose compromises the signal-to-noise ratio, leading to degraded quality of CT images. To address this, we analyze LDCT denoising task based on experimental results from the frequency perspective, and… ▽ More

    Submitted 1 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: MICCAI2024

  19. arXiv:2403.04502  [pdf, other

    cs.IT eess.SP

    Matched-filter Precoded Rate Splitting Multiple Access: A Simple and Energy-efficient Design

    Authors: Hui Zhao, Dirk Slock

    Abstract: We introduce an energy-efficient downlink rate splitting multiple access (RSMA) scheme, employing a simple matched filter (MF) for precoding. We consider a transmitter equipped with multiple antennas, serving several single-antenna users at the same frequency-time resource, each with distinct message requests. Within the conventional 1-layer RSMA framework, requested messages undergo splitting int… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 7 pages, 6 figures

  20. arXiv:2403.01598  [pdf, other

    eess.IV cs.AI cs.CV

    APISR: Anime Production Inspired Real-World Anime Super-Resolution

    Authors: Boyang Wang, Fengyu Yang, Xihang Yu, Chao Zhang, Hanbin Zhao

    Abstract: While real-world anime super-resolution (SR) has gained increasing attention in the SR community, existing methods still adopt techniques from the photorealistic domain. In this paper, we analyze the anime production workflow and rethink how to use characteristics of it for the sake of the real-world anime SR. First, we argue that video networks and datasets are not necessary for anime SR due to t… ▽ More

    Submitted 4 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  21. arXiv:2402.18867  [pdf, other

    eess.SP cs.SI eess.SY

    Message-Enhanced DeGroot Model

    Authors: Huisheng Wang, Zhanjiang Chen, H. Vicky Zhao

    Abstract: Understanding the impact of messages on agents' opinions over social networks is important. However, to our best knowledge, there has been limited quantitative investigation into this phenomenon in the prior works. To address this gap, this paper proposes the Message-Enhanced DeGroot model. The Bounded Brownian Message model provides a quantitative description of the message evolution, jointly con… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  22. arXiv:2402.11294  [pdf, other

    cs.IT eess.SP

    Power Optimization for Integrated Active and Passive Sensing in DFRC Systems

    Authors: Xingliang Lou, Wenchao Xia, Kai-Kit Wong, Haitao Zhao, Tony Q. S. Quek, Hongbo Zhu

    Abstract: Most existing works on dual-function radar-communication (DFRC) systems mainly focus on active sensing, but ignore passive sensing. To leverage multi-static sensing capability, we explore integrated active and passive sensing (IAPS) in DFRC systems to remedy sensing performance. The multi-antenna base station (BS) is responsible for communication and active sensing by transmitting signals to user… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  23. arXiv:2402.07485  [pdf, other

    cs.SD eess.AS

    MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning

    Authors: Hang Zhao, Yifei Xin, Zhesong Yu, Bilei Zhu, Lu Lu, Zejun Ma

    Abstract: In the realm of audio-language pre-training (ALP), the challenge of achieving cross-modal alignment is significant. Moreover, the integration of audio inputs with diverse distributions and task variations poses challenges in developing generic audio-language models. In this study, we present MINT, a novel ALP framework boosting audio-language models through multi-target pre-training and instructio… ▽ More

    Submitted 11 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  24. arXiv:2402.01808  [pdf, other

    cs.SD eess.AS

    KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge

    Authors: Guochen Yu, Runqiang Han, Chenglin Xu, Haoran Zhao, Nan Li, Chen Zhang, Xiguang Zheng, Chao Zhou, Qi Huang, Bing Yu

    Abstract: This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024; Rank 1st in ICASSP 2024 Speech Signal Improvement (SSI) Challenge

  25. arXiv:2402.01523  [pdf

    eess.SY

    Active Support of Inverters for Improving Short-Term Voltage Security in 100% IBRsPenetrated Power Systems

    Authors: Yinhong Lin, Bin Wang, Qinglai Guo, Haotian Zhao, Hongbin Sun

    Abstract: Due to the energy crisis and environmental pollution, the installed capacity of inverter-based resources (IBRs) in power grids is rapidly increasing, and grid-following control (GFL) is the most prevalent at present. Meanwhile, grid-forming control-based (GFM) devices have been installed in the grid to provide active support for frequency and voltage. In the future GFL devices combined with GFM wi… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 10 pages, 13 figures

  26. arXiv:2401.16958  [pdf, ps, other

    cs.IT eess.SP

    Exact SINR Analysis of Matched-filter Precoder

    Authors: Hui Zhao, Dirk Slock, Petros Elia

    Abstract: This paper answers a fundamental question about the exact distribution of the signal-to-interference-plus-noise ratio (SINR) under matched-filter (MF) precoding. Specifically, we derive the exact expressions for the cumulative distribution function (CDF) and the probability density function (PDF) of SINR under MF precoding over Rayleigh fading channels. Based on the exact analysis, we then rigorou… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 6 pages, 3 figures

  27. arXiv:2401.15993  [pdf, other

    cs.SD eess.AS

    Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings

    Authors: He Zhao, Hangting Chen, Jianwei Yu, Yuehai Wang

    Abstract: Target speaker extraction (TSE) aims to extract the target speaker's voice from the input mixture. Previous studies have concentrated on high-overlapping scenarios. However, real-world applications usually meet more complex scenarios like variable speaker overlapping and target speaker absence. In this paper, we introduces a framework to perform continuous TSE (C-TSE), comprising a target speaker… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 8 pages, 6 figures

  28. arXiv:2401.09800  [pdf

    eess.SY

    Power System Fault Diagnosis with Quantum Computing and Efficient Gate Decomposition

    Authors: Xiang Fei, Huan Zhao, Xiyuan Zhou, Junhua Zhao, Ting Shu, Fushuan Wen

    Abstract: Power system fault diagnosis is crucial for identifying the location and causes of faults and providing decision-making support for power dispatchers. However, most classical methods suffer from significant time-consuming, memory overhead, and computational complexity issues as the scale of the power system concerned increases. With rapid development of quantum computing technology, the combinator… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  29. arXiv:2401.07183  [pdf, other

    eess.SY math.OC q-fin.MF q-fin.PM

    Optimal Investment with Herd Behaviour Using Rational Decision Decomposition

    Authors: Huisheng Wang, H. Vicky Zhao

    Abstract: In this paper, we study the optimal investment problem considering the herd behaviour between two agents, including one leading expert and one following agent whose decisions are influenced by those of the leading expert. In the objective functional of the optimal investment problem, we introduce the average deviation term to measure the distance between the two agents' decisions and use the varia… ▽ More

    Submitted 15 July, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

  30. arXiv:2401.02555  [pdf, other

    cs.CE eess.SY math.DS stat.AP

    Data-Driven Estimation of Failure Probabilities in Correlated Structure-Preserving Stochastic Power System Models

    Authors: Hongli Zhao, Tyler E. Maltba, D. Adrian Maldonado, Emil Constantinescu, Mihai Anitescu

    Abstract: We propose a data-driven approach for propagating uncertainty in stochastic power grid simulations and apply it to the estimation of transmission line failure probabilities. A reduced-order equation governing the evolution of the observed line energy probability density function is derived from the Fokker--Planck equation of the full-order continuous Markov process. Our method consists of estimate… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: 12 pages, 6 figures, and 1 table

  31. Exploiting Multipath Information for Integrated Localization and Sensing via PHD Filtering

    Authors: Yinuo Du, Hanying Zhao, Yang Liu, Xinlei Yu, Yuan Shen

    Abstract: Accurate localization and perception are pivotal for enhancing the safety and reliability of vehicles. However, current localization methods suffer from reduced accuracy when the line-of-sight (LOS) path is obstructed, or a combination of reflections and scatterings is present. In this paper, we present an integrated localization and sensing method that delivers superior performance in complex env… ▽ More

    Submitted 15 August, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

    Comments: 6 pages, 6 figures. This work has been accepted and published by the IEEE Transactions on Vehicular Technology (2024)

  32. arXiv:2312.08089  [pdf, other

    eess.AS

    Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier

    Authors: Yinlin Guo, Haofan Huang, Xi Chen, He Zhao, Yuehai Wang

    Abstract: With the rapid development of speech synthesis and voice conversion technologies, Audio Deepfake has become a serious threat to the Automatic Speaker Verification (ASV) system. Numerous countermeasures are proposed to detect this type of attack. In this paper, we report our efforts to combine the self-supervised WavLM model and Multi-Fusion Attentive classifier for audio deepfake detection. Our me… ▽ More

    Submitted 9 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP 2024. 5 pages, 1 figure

  33. arXiv:2312.07911  [pdf

    eess.IV cs.CV

    Projective Parallel Single-Pixel Imaging: 3D Structured Light Scanning Under Global Illumination

    Authors: Yuxi Li, Hongzhi Jiang, Huijie Zhao, Xudong Li

    Abstract: We present projective parallel single-pixel imaging (pPSI), a 3D photography method that provides a robust and efficient way to analyze the light transport behavior and enables separation of light effect due to global illumination, thereby achieving 3D structured light scanning under global illumination. The light transport behavior is described by the light transport coefficients (LTC), which con… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 21 pages,13 figures

  34. arXiv:2312.05256  [pdf, other

    eess.IV cs.AI

    Holistic Evaluation of GPT-4V for Biomedical Imaging

    Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, Jingyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

    Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  35. arXiv:2312.04131  [pdf, other

    eess.AS cs.SD

    Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization

    Authors: Huan Zhao, Li Zhang, Yue Li, Yannan Wang, Hongji Wang, Wei Rao, Qing Wang, Lei Xie

    Abstract: The scarcity of labeled audio-visual datasets is a constraint for training superior audio-visual speaker diarization systems. To improve the performance of audio-visual speaker diarization, we leverage pre-trained supervised and self-supervised speech models for audio-visual speaker diarization. Specifically, we adopt supervised~(ResNet and ECAPA-TDNN) and self-supervised pre-trained models~(WavLM… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  36. arXiv:2311.13361  [pdf, other

    cs.AI cs.HC eess.SY

    Applying Large Language Models to Power Systems: Potential Security Threats

    Authors: Jiaqi Ruan, Gaoqi Liang, Huan Zhao, Guolong Liu, Xianzhuo Sun, Jing Qiu, Zhao Xu, Fushuan Wen, Zhao Yang Dong

    Abstract: Applying large language models (LLMs) to modern power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this article analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and d… ▽ More

    Submitted 24 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

  37. arXiv:2311.06825  [pdf, ps, other

    cs.IT eess.SP

    Secure Rate-Splitting Multiple Access Transmissions in LMS Systems

    Authors: Minjue He, Hui Zhao, Xiaqing Miao, Shuai Wang, Gaofeng Pan

    Abstract: This letter investigates the secure delivery performance of the rate-splitting multiple access scheme in land mobile satellite (LMS) systems, considering that the private messages intended by a terminal can be eavesdropped by any others from the broadcast signals. Specifically, the considered system has an N-antenna satellite and numerous single-antenna land users. Maximum ratio transmission (MRT)… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, 1 table

  38. arXiv:2310.07511  [pdf

    cs.CV cs.LG eess.IV

    A Unified Remote Sensing Anomaly Detector Across Modalities and Scenes via Deviation Relationship Learning

    Authors: Jingtao Li, Xinyu Wang, Hengwei Zhao, Liangpei Zhang, Yanfei Zhong

    Abstract: Remote sensing anomaly detector can find the objects deviating from the background as potential targets. Given the diversity in earth anomaly types, a unified anomaly detector across modalities and scenes should be cost-effective and flexible to new earth observation sources and anomaly types. However, the current anomaly detectors are limited to a single modality and single scene, since they aim… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Journal paper

  39. arXiv:2310.06414  [pdf

    cs.RO eess.SP eess.SY

    Plane Constraints Aided Multi-Vehicle Cooperative Positioning Using Factor Graph Optimization

    Authors: Chen Zhuang, Hongbo Zhao

    Abstract: The development of vehicle-to-vehicle (V2V) communication facil-itates the study of cooperative positioning (CP) techniques for vehicular applications. The CP methods can improve the posi-tioning availability and accuracy by inter-vehicle ranging and data exchange between vehicles. However, the inter-vehicle rang-ing can be easily interrupted due to many factors such as obsta-cles in-between two c… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 14 pages, 16 figures, IEEE trans on ITS

  40. arXiv:2310.01176  [pdf, other

    eess.IV cs.CV

    Cross-adversarial local distribution regularization for semi-supervised medical image segmentation

    Authors: Thanh Nguyen-Duc, Trung Le, Roland Bammer, He Zhao, Jianfei Cai, Dinh Phung

    Abstract: Medical semi-supervised segmentation is a technique where a model is trained to segment objects of interest in medical images with limited annotated data. Existing semi-supervised segmentation methods are usually based on the smoothness assumption. This assumption implies that the model output distributions of two similar data samples are encouraged to be invariant. In other words, the smoothness… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: MICCAI 2023

  41. Scalable Neural Dynamic Equivalence for Power Systems

    Authors: Qing Shen, Yifan Zhou, Huanfeng Zhao, Peng Zhang, Qiang Zhang, Slava Maslenniko, Xiaochuan Luo

    Abstract: Traditional grid analytics are model-based, relying strongly on accurate models of power systems, especially the dynamic models of generators, controllers, loads and other dynamic components. However, acquiring thorough power system models can be impractical in real operation due to inaccessible system parameters and privacy of consumers, which necessitate data-driven dynamic equivalencing of unkn… ▽ More

    Submitted 21 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Journal ref: in IEEE Access, vol. 12, pp. 86513-86522, 2024,

  42. arXiv:2309.09577  [pdf, ps, other

    eess.SP

    Adaptive Unscented Kalman Filter under Minimum Error Entropy with Fiducial Points for Non-Gaussian Systems

    Authors: Boyu Tian, Haiquan Zhao

    Abstract: The minimum error entropy (MEE) has been extensively used in unscented Kalman filter (UKF) to handle impulsive noises or abnormal measurement data in non-Gaussian systems. However, the MEE-UKF has poor numerical stability due to the inverse operation of singular matrix. In this paper, a novel UKF based on minimum error entropy with fiducial points (MEEF) is proposed \textcolor{black}{to improve th… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 29 pages,6 figures

    MSC Class: 94-10; 94-05 ACM Class: H.1.1; H.4.3

    Journal ref: Automatica(March 22 2022)

  43. arXiv:2309.08835  [pdf

    eess.SP cs.LG cs.NE cs.RO

    Intelligent machines work in unstructured environments by differential neuromorphic computing

    Authors: Shengbo Wang, Shuo Gao, Chenyu Tang, Edoardo Occhipinti, Cong Li, Shurui Wang, Jiaqi Wang, Hubin Zhao, Guohua Hu, Arokia Nathan, Ravinder Dahiya, Luigi Occhipinti

    Abstract: Efficient operation of intelligent machines in the real world requires methods that allow them to understand and predict the uncertainties presented by the unstructured environments with good accuracy, scalability and generalization, similar to humans. Current methods rely on pretrained networks instead of continuously learning from the dynamic signal properties of working environments and suffer… ▽ More

    Submitted 17 November, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: 16 pages, 5 figures

    Journal ref: Nat Commun, vol. 15, no. 1, p. 4671, May 2024

  44. arXiv:2309.05370  [pdf, other

    cs.SI eess.SP

    Opinion Dynamics in Two-Step Process: Message Sources, Opinion Leaders and Normal Agents

    Authors: Huisheng Wang, Yuejiang Li, Yiqing Lin, H. Vicky Zhao

    Abstract: According to mass media theory, the dissemination of messages and the evolution of opinions in social networks follow a two-step process. First, opinion leaders receive the message from the message sources, and then they transmit their opinions to normal agents. However, most opinion models only consider the evolution of opinions within a single network, which fails to capture the two-step process… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  45. arXiv:2309.04670  [pdf

    eess.SP cs.SD eess.AS eess.IV eess.SY

    Generalized Minimum Error with Fiducial Points Criterion for Robust Learning

    Authors: Haiquan Zhao, Yuan Gao, Yingying Zhu

    Abstract: The conventional Minimum Error Entropy criterion (MEE) has its limitations, showing reduced sensitivity to error mean values and uncertainty regarding error probability density function locations. To overcome this, a MEE with fiducial points criterion (MEEF), was presented. However, the efficacy of the MEEF is not consistent due to its reliance on a fixed Gaussian kernel. In this paper, a generali… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 12 pages, 9 figures

    ACM Class: I.5.3; I.5.4; I.4.9

  46. arXiv:2309.04654  [pdf, other

    cs.SD eess.AS

    Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

    Authors: Huaibo Zhao, Yosuke Higuchi, Yusuke Kida, Tetsuji Ogawa, Tetsunori Kobayashi

    Abstract: Achieving high accuracy with low latency has always been a challenge in streaming end-to-end automatic speech recognition (ASR) systems. By attending to more future contexts, a streaming ASR model achieves higher accuracy but results in larger latency, which hurts the streaming performance. In the Mask-CTC framework, an encoder network is trained to learn the feature representation that anticipate… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: Accepted to EUSIPCO 2023

  47. arXiv:2309.02318  [pdf, other

    cs.CV eess.IV

    TiAVox: Time-aware Attenuation Voxels for Sparse-view 4D DSA Reconstruction

    Authors: Zhenghong Zhou, Huangxuan Zhao, Jiemin Fang, Dongqiao Xiang, Lei Chen, Lingxia Wu, Feihong Wu, Wenyu Liu, Chuansheng Zheng, Xinggang Wang

    Abstract: Four-dimensional Digital Subtraction Angiography (4D DSA) plays a critical role in the diagnosis of many medical diseases, such as Arteriovenous Malformations (AVM) and Arteriovenous Fistulas (AVF). Despite its significant application value, the reconstruction of 4D DSA demands numerous views to effectively model the intricate vessels and radiocontrast flow, thereby implying a significant radiatio… ▽ More

    Submitted 19 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures

  48. arXiv:2308.13205  [pdf, other

    cs.RO eess.SY

    Design and Control of a Bio-inspired Wheeled Bipedal Robot

    Authors: Haizhou Zhao, Lei Yu, Siying Qin, Gumin Jin, Yuqing Chen

    Abstract: Wheeled bipedal robots (WBRs) have the capability to execute agile and versatile locomotion tasks. This paper focuses on improving the dynamic performance of WBRs through innovations in both hardware and software development. Inspired by the human barbell squat, a bionic mechanical design is proposed and implemented as shown in Fig. 1. It distributes the weight onto its hip and knee joints to impr… ▽ More

    Submitted 16 July, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

  49. arXiv:2308.08125  [pdf, other

    cs.SD cs.CL cs.HC eess.AS

    Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

    Authors: Running Zhao, Jiangtao Yu, Hang Zhao, Edith C. H. Ngai

    Abstract: Millimeter wave (mmWave) based speech recognition provides more possibility for audio-related applications, such as conference speech transcription and eavesdropping. However, considering the practicality in real scenarios, latency and recognizable vocabulary size are two critical factors that cannot be overlooked. In this paper, we propose Radio2Text, the first mmWave-based system for streaming a… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (ACM IMWUT/UbiComp 2023)

  50. arXiv:2308.04930  [pdf, other

    eess.SP

    Striking The Right Balance: Three-Dimensional Ocean Sound Speed Field Reconstruction Using Tensor Neural Networks

    Authors: Siyuan Li, Lei Cheng, Ting Zhang, Hangfang Zhao, Jianlong Li

    Abstract: Accurately reconstructing a three-dimensional ocean sound speed field (3D SSF) is essential for various ocean acoustic applications, but the sparsity and uncertainty of sound speed samples across a vast ocean region make it a challenging task. To tackle this challenge, a large body of reconstruction methods has been developed, including spline interpolation, matrix/tensor-based completion, and dee… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.