Zum Hauptinhalt springen

Showing 1–50 of 308 results for author: Xue, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.14585  [pdf, other

    cs.CV cs.SD eess.AS

    Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

    Authors: Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Audio-Visual Speaker Tracking with Incomplete Modalities

  2. arXiv:2408.14453  [pdf

    cs.LG eess.IV eess.SP

    Reconstructing physiological signals from fMRI across the adult lifespan

    Authors: Shiyu Wang, Ziyuan Xu, Yamin Li, Mara Mather, Roza G. Bayrak, Catie Chang

    Abstract: Interactions between the brain and body are of fundamental importance for human behavior and health. Functional magnetic resonance imaging (fMRI) captures whole-brain activity noninvasively, and modeling how fMRI signals interact with physiological dynamics of the body can provide new insight into brain function and offer potential biomarkers of disease. However, physiological recordings are not a… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  3. arXiv:2408.13470  [pdf, other

    eess.SP

    Performance Analysis of Photon-Limited Free-Space Optical Communications with Practical Photon-Counting Receivers

    Authors: Chen Wang, Zhiyong Xu, Jingyuan Wang, Jianhua Li, Weifeng Mou, Huatao Zhu, Jiyong Zhao, Yang Su, Yimin Wang, Ailin Qi

    Abstract: The non-perfect factors of practical photon-counting receiver are recognized as a significant challenge for long-distance photon-limited free-space optical (FSO) communication systems. This paper presents a comprehensive analytical framework for modeling the statistical properties of time-gated single-photon avalanche diode (TG-SPAD) based photon-counting receivers in presence of dead time, non-ph… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  4. arXiv:2408.10800  [pdf, other

    eess.SP

    A Novel Signal Detection Method for Photon-Counting Communications with Nonlinear Distortion Effects

    Authors: Chen Wang, Zhiyong Xu, Jingyuan Wang, Jianhua Li, Weifeng Mou, Huatao Zhu, Jiyong Zhao, Yang Su, Yimin Wang, Ailin Qi

    Abstract: This paper proposes a method for estimating and detecting optical signals in practical photon-counting receivers. There are two important aspects of non-perfect photon-counting receivers, namely, (i) dead time which results in blocking loss, and (ii) non-photon-number-resolving, which leads to counting loss during the gate-ON interval. These factors introduce nonlinear distortion to the detected p… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.09844  [pdf, ps, other

    eess.SP

    Joint Beamforming and Power Control for D2D-Assisted Integrated Sensing and Communication Networks

    Authors: Zhenyu Xue, Yuang Chen, Hancheng Lu, Baolin Chong, Wanqing Long

    Abstract: Integrated sensing and communication (ISAC) is an emerging technology in next-generation communication networks. However, the communication performance of the ISAC system may be severely affected by interference from the radar system if the sensing task has demanding performance requirements. In this paper, we exploit device-to-device communication (D2D) to improve system communication capacity. T… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  6. arXiv:2408.06468  [pdf, other

    cs.SD cs.MM eess.AS eess.SP

    FoVNet: Configurable Field-of-View Speech Enhancement with Low Computation and Distortion for Smart Glasses

    Authors: Zhongweiyang Xu, Ali Aroudi, Ke Tan, Ashutosh Pandey, Jung-Suk Lee, Buye Xu, Francesco Nesta

    Abstract: This paper presents a novel multi-channel speech enhancement approach, FoVNet, that enables highly efficient speech enhancement within a configurable field of view (FoV) of a smart-glasses user without needing specific target-talker(s) directions. It advances over prior works by enhancing all speakers within any given FoV, with a hybrid signal processing and deep learning approach designed with hi… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted by INTERSPEECH2024

  7. arXiv:2408.02085  [pdf, other

    cs.CV cs.AI cs.CL eess.SP

    Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

    Authors: Yulei Qin, Yuncheng Yang, Pengcheng Guo, Gang Li, Hang Shao, Yuchen Shi, Zihan Xu, Yun Gu, Ke Li, Xing Sun

    Abstract: Instruction tuning plays a critical role in aligning large language models (LLMs) with human preference. Despite the vast amount of open instruction datasets, naively training a LLM on all existing instructions may not be optimal and practical. To pinpoint the most beneficial datapoints, data assessment and selection methods have been proposed in the fields of natural language processing (NLP) and… ▽ More

    Submitted 7 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: review, survey, 28 pages, 2 figures, 4 tables

  8. arXiv:2408.01553  [pdf, other

    cs.CV eess.IV

    Multi-task SAR Image Processing via GAN-based Unsupervised Manipulation

    Authors: Xuran Hu, Mingzhe Zhu, Ziqiang Xu, Zhenpeng Feng, Ljubisa Stankovic

    Abstract: Generative Adversarial Networks (GANs) have shown tremendous potential in synthesizing a large number of realistic SAR images by learning patterns in the data distribution. Some GANs can achieve image editing by introducing latent codes, demonstrating significant promise in SAR image processing. Compared to traditional SAR image processing methods, editing based on GAN latent space control is enti… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 19 pages, 17 figures, 7 tables

  9. arXiv:2407.19736  [pdf, other

    cs.LG eess.SP

    Sensor Selection via GFlowNets: A Deep Generative Modeling Framework to Navigate Combinatorial Complexity

    Authors: Spilios Evmorfos, Zhaoyi Xu, Athina Petropulu

    Abstract: The performance of sensor arrays in sensing and wireless communications improves with more elements, but this comes at the cost of increased energy consumption and hardware expense. This work addresses the challenge of selecting $k$ sensor elements from a set of $m$ to optimize a generic Quality-of-Service metric. Evaluating all $\binom{m}{k}$ possible sensor subsets is impractical, leading to pri… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  10. arXiv:2407.13632  [pdf, other

    cs.CV cs.LG eess.IV

    Data Alchemy: Mitigating Cross-Site Model Variability Through Test Time Data Calibration

    Authors: Abhijeet Parida, Antonia Alomar, Zhifan Jiang, Pooneh Roshanitabrizi, Austin Tapp, Maria Ledesma-Carbayo, Ziyue Xu, Syed Muhammed Anwar, Marius George Linguraru, Holger R. Roth

    Abstract: Deploying deep learning-based imaging tools across various clinical sites poses significant challenges due to inherent domain shifts and regulatory hurdles associated with site-specific fine-tuning. For histopathology, stain normalization techniques can mitigate discrepancies, but they often fall short of eliminating inter-site variations. Therefore, we present Data Alchemy, an explainable stain n… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: accepted to Machine Learning in Medical Imaging (MLMI 2024)

  11. arXiv:2407.13211  [pdf

    cs.CV eess.IV

    Research on Image Super-Resolution Reconstruction Mechanism based on Convolutional Neural Network

    Authors: Hao Yan, Zixiang Wang, Zhengjia Xu, Zhuoyue Wang, Zhizhong Wu, Ranran Lyu

    Abstract: Super-resolution reconstruction techniques entail the utilization of software algorithms to transform one or more sets of low-resolution images captured from the same scene into high-resolution images. In recent years, considerable advancement has been observed in the domain of single-image super-resolution algorithms, particularly those based on deep learning techniques. Nevertheless, the extract… ▽ More

    Submitted 31 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  12. arXiv:2407.10382  [pdf, other

    cs.RO cs.AI cs.MA eess.SY math.OC

    Communication- and Computation-Efficient Distributed Decision-Making in Multi-Robot Networks

    Authors: Zirui Xu, Sandilya Sai Garimella, Vasileios Tzoumas

    Abstract: We provide a distributed coordination paradigm that enables scalable and near-optimal joint motion planning among multiple robots. Our coordination paradigm contrasts with current paradigms that are either near-optimal but impractical for replanning times or real-time but offer no near-optimality guarantees. We are motivated by the future of collaborative mobile autonomy, where distributed teams o… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  13. arXiv:2407.03307  [pdf, other

    eess.IV cs.CV

    HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

    Authors: Yucheng Tang, Yufan He, Vishwesh Nath, Pengfeig Guo, Ruining Deng, Tianyuan Yao, Quan Liu, Can Cui, Mengmeng Yin, Ziyue Xu, Holger Roth, Daguang Xu, Haichun Yang, Yuankai Huo

    Abstract: In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  14. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  15. arXiv:2406.15716  [pdf, other

    eess.IV cs.CV

    Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge

    Authors: Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz

    Abstract: Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the flu… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  16. arXiv:2406.09272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

    Authors: Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman

    Abstract: Generating realistic audio for human actions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during training, yet many sounds happen off-screen and have weak to no correspondence with the visuals -- resulting in uncontrolled ambient sounds or hallucinations… ▽ More

    Submitted 25 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://vision.cs.utexas.edu/projects/action2sound. ECCV 2024 camera-ready version

  17. arXiv:2406.02640  [pdf, other

    eess.IV physics.med-ph physics.optics

    Ghost imaging-based Non-contact Heart Rate Detection

    Authors: Jianming Yu, Yuchen He, Bin Li, Hui Chen, Huaibin Zheng, Jianbin Liu, Zhuo Xu

    Abstract: Remote heart rate measurement is an increasingly concerned research field, usually using remote photoplethysmography (rPPG) to collect heart rate information through video data collection. However, in certain specific scenarios (such as low light conditions, intense lighting, and non-line-of-sight situations), traditional imaging methods fail to capture image information effectively, that may lead… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 4 pages, 6 figures

  18. arXiv:2405.05579  [pdf

    cs.HC eess.SY

    Intelligent EC Rearview Mirror: Enhancing Driver Safety with Dynamic Glare Mitigation via Cloud Edge Collaboration

    Authors: Junyi Yang, Zefei Xu, Huayi Lai, Hongjian Chen, Sifan Kong, Yutong Wu, Huan Yang

    Abstract: Sudden glare from trailing vehicles significantly increases driving safety risks. Existing anti-glare technologies such as electronic, manually-adjusted, and electrochromic rearview mirrors, are expensive and lack effective adaptability in different lighting conditions. To address these issues, our research introduces an intelligent rearview mirror system utilizing novel all-liquid electrochromic… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  19. arXiv:2405.04806  [pdf, other

    eess.SY

    A leadless power transfer and wireless telemetry solutions for an endovascular electrocorticography

    Authors: Zhangyu Xu, Majid Khazaee, Nhan Duy Truong, Deniel Havenga, Armin Nikpour, Arman Ahnood, Omid Kavehei

    Abstract: Endovascular brain-computer interfaces (eBCIs) offer a minimally invasive way to connect the brain to external devices, merging neuroscience, engineering, and medical technology. Achieving wireless data and power transmission is crucial for the clinical viability of these implantable devices. Typically, solutions for endovascular electrocorticography (ECoG) include a sensing stent with multiple el… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 17 Pages, 12 figures

  20. arXiv:2404.05404  [pdf, other

    eess.SY

    Contouring Error Bounded Control for Biaxial Switched Linear Systems

    Authors: Meng Yuan, Ye Wang, Chris Manzie, Zhezhuang Xu, Tianyou Chai

    Abstract: Biaxial motion control systems are used extensively in manufacturing and printing industries. To improve throughput and reduce machine cost, lightweight materials are being proposed in structural components but may result in higher flexibility in the machine links. This flexibility is often position dependent and compromises precision of the end effector of the machine. To address the need for imp… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  21. arXiv:2404.01082  [pdf, other

    eess.IV

    The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

    Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Liping Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

    Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More

    Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 25 pages, 17 figures

  22. arXiv:2403.16361  [pdf, other

    eess.IV cs.CV

    RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

    Authors: Ziheng Deng, Hua Chen, Haibo Hu, Zhiyong Xu, Jiayuan Sun, Tianling Lyu, Yan Xi, Yang Chen, Jun Zhao

    Abstract: Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, the cone-beam projections become much sparser and the reconstructed 4D CBCT images will be covered… ▽ More

    Submitted 22 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  23. arXiv:2403.15716  [pdf, other

    cs.RO cs.AI eess.SY

    Distributed Robust Learning based Formation Control of Mobile Robots based on Bioinspired Neural Dynamics

    Authors: Zhe Xu, Tao Yan, Simon X. Yang, S. Andrew Gadsden, Mohammad Biglarbegian

    Abstract: This paper addresses the challenges of distributed formation control in multiple mobile robots, introducing a novel approach that enhances real-world practicability. We first introduce a distributed estimator using a variable structure and cascaded design technique, eliminating the need for derivative information to improve the real time performance. Then, a kinematic tracking control method is de… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: This paper is accepted by IEEE Transactions on Intelligent Vehicles

  24. arXiv:2403.13601  [pdf, other

    eess.SY

    Lattice piecewise affine approximation of explicit model predictive control with application to satellite attitude control

    Authors: Zhengqi Xu, Jun Xu, Ai-Guo Wu, Shuning Wang

    Abstract: Satellite attitude cotrol is a crucial part of aerospace technology, and model predictive control(MPC) is one of the most promising controllers in this area, which will be less effective if real-time online optimization can not be achieved. Explicit MPC converts the online calculation into a table lookup process, however the solution is difficult to obtain if the system dimension is high or the co… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  25. arXiv:2403.10012  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation

    Authors: Qi Jiang, Zhonghua Yi, Shaohua Gao, Yao Gao, Xiaolong Qian, Hao Shi, Lei Sun, Zhijie Xu, Kailun Yang, Kaiwei Wang

    Abstract: Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervi… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Codes and datasets will be made publicly available at https://github.com/zju-jiangqi/QDMR

  26. arXiv:2403.09993  [pdf, other

    cs.CV eess.IV

    TRG-Net: An Interpretable and Controllable Rain Generator

    Authors: Zhiqiang Pang, Hong Wang, Qi Xie, Deyu Meng, Zongben Xu

    Abstract: Exploring and modeling rain generation mechanism is critical for augmenting paired data to ease training of rainy image processing models. Against this task, this study proposes a novel deep learning based rain generator, which fully takes the physical generation mechanism underlying rains into consideration and well encodes the learning of the fundamental rain factors (i.e., shape, orientation, l… ▽ More

    Submitted 29 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  27. arXiv:2403.04549  [pdf, other

    cs.CV eess.IV

    Explainable Face Verification via Feature-Guided Gradient Backpropagation

    Authors: Yuhang Lu, Zewei Xu, Touradj Ebrahimi

    Abstract: Recent years have witnessed significant advancement in face recognition (FR) techniques, with their applications widely spread in people's lives and security-sensitive areas. There is a growing need for reliable interpretations of decisions of such systems. Existing studies relying on various mechanisms have investigated the usage of saliency maps as an explanation approach, but suffer from differ… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  28. arXiv:2402.19013  [pdf, other

    eess.SY

    Ultraviolet Positioning via TDOA: Error Analysis and System Prototype

    Authors: Shihui Yu, Chubing Lv, Yueke Yang, Yuchen Pan, Lei Sun, Juliang Cao, Ruihang Yu, Chen Gong, Wenqi Wu, Zhengyuan Xu

    Abstract: This work performs the design, real-time hardware realization, and experimental evaluation of a positioning system by ultra-violet (UV) communication under photon-level signal detection. The positioning is based on time-difference of arrival (TDOA) principle. Time division-based transmission of synchronization sequence from three transmitters with known positions is applied. We investigate the pos… ▽ More

    Submitted 14 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  29. arXiv:2402.06841  [pdf

    eess.IV cs.CV

    Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA

    Authors: Shaojie Tang, Penpen Miao, Xingyu Gao, Yu Zhong, Dantong Zhu, Haixing Wen, Zhihui Xu, Qiuyue Wei, Hongping Yao, Xin Huang, Rui Gao, Chen Zhao, Weihua Zhou

    Abstract: A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point c… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  30. arXiv:2402.00744  [pdf, other

    cs.SD cs.CL eess.AS

    BATON: Aligning Text-to-Audio Model with Human Preference Feedback

    Authors: Huan Liao, Haonan Han, Kai Yang, Tianjiao Du, Rui Yang, Zunnan Xu, Qinmei Xu, Jingquan Liu, Jiasheng Lu, Xiu Li

    Abstract: With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention. However, it is challenging for these models to generate audio aligned with human preference due to the inherent information density of natural language and limited model understanding ability. To alleviate this issue, we formulate the BATON, a framework designed to enhance the alignment betw… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  31. arXiv:2402.00320  [pdf

    eess.IV

    DARCS: Memory-Efficient Deep Compressed Sensing Reconstruction for Acceleration of 3D Whole-Heart Coronary MR Angiography

    Authors: Zhihao Xue, Fan Yang, Juan Gao, Zhuo Chen, Hao Peng, Chao Zou, Hang Jin, Chenxi Hu

    Abstract: Three-dimensional coronary magnetic resonance angiography (CMRA) demands reconstruction algorithms that can significantly suppress the artifacts from a heavily undersampled acquisition. While unrolling-based deep reconstruction methods have achieved state-of-the-art performance on 2D image reconstruction, their application to 3D reconstruction is hindered by the large amount of memory needed to tr… ▽ More

    Submitted 2 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: 10 pages, 8 figures

  32. arXiv:2401.15913  [pdf, other

    eess.IV cs.CV cs.LG physics.flu-dyn stat.AP

    Vision-Informed Flow Image Super-Resolution with Quaternion Spatial Modeling and Dynamic Flow Convolution

    Authors: Qinglong Cao, Zhengqin Xu, Chao Ma, Xiaokang Yang, Yuntian Chen

    Abstract: Flow image super-resolution (FISR) aims at recovering high-resolution turbulent velocity fields from low-resolution flow images. Existing FISR methods mainly process the flow images in natural image patterns, while the critical and distinct flow visual properties are rarely considered. This negligence would cause the significant domain gap between flow and natural images to severely hamper the acc… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  33. arXiv:2401.10269  [pdf, ps, other

    cs.IT eess.SP stat.ME

    Robust Multi-Sensor Multi-Target Tracking Using Possibility Labeled Multi-Bernoulli Filter

    Authors: Han Cai, Chenbao Xue, Jeremie Houssineau, Zhirun Xue

    Abstract: With the increasing complexity of multiple target tracking scenes, a single sensor may not be able to effectively monitor a large number of targets. Therefore, it is imperative to extend the single-sensor technique to Multi-Sensor Multi-Target Tracking (MSMTT) for enhanced functionality. Typical MSMTT methods presume complete randomness of all uncertain components, and therefore effective solution… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  34. arXiv:2401.03476  [pdf, other

    cs.MM cs.AI cs.HC cs.SD eess.AS

    Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness

    Authors: Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu

    Abstract: Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tac… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 6 pages, 3 figures, ICASSP 2024

  35. arXiv:2401.03150  [pdf, other

    eess.IV

    O-PRESS: Boosting OCT axial resolution with Prior guidance, Recurrence, and Equivariant Self-Supervision

    Authors: Kaiyan Li, Jingyuan Yang, Wenxuan Liang, Xingde Li, Chenxi Zhang, Lulu Chen, Chan Wu, Xiao Zhang, Zhiyan Xu, Yuelin Wang, Lihui Meng, Yue Zhang, Youxin Chen, S. Kevin Zhou

    Abstract: Optical coherence tomography (OCT) is a noninvasive technology that enables real-time imaging of tissue microanatomies. The axial resolution of OCT is intrinsically constrained by the spectral bandwidth of the employed light source while maintaining a fixed center wavelength for a specific application. Physically extending this bandwidth faces strong limitations and requires a substantial cost. We… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  36. arXiv:2401.03122  [pdf, other

    cs.CV eess.IV

    SAR Despeckling via Regional Denoising Diffusion Probabilistic Model

    Authors: Xuran Hu, Ziqiang Xu, Zhihan Chen, Zhengpeng Feng, Mingzhe Zhu, LJubisa Stankovic

    Abstract: Speckle noise poses a significant challenge in maintaining the quality of synthetic aperture radar (SAR) images, so SAR despeckling techniques have drawn increasing attention. Despite the tremendous advancements of deep learning in fixed-scale SAR image despeckling, these methods still struggle to deal with large-scale SAR images. To address this problem, this paper introduces a novel despeckling… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 5 pages, 5 figures

    ACM Class: I.4.4

  37. arXiv:2312.15863  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

    Authors: Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, Jiangjin Yin

    Abstract: Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at t… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024, full paper with oral presentation). Cover our preliminary study: arXiv:2212.14538

  38. arXiv:2312.15701  [pdf, other

    eess.IV cs.CV cs.LG

    Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration

    Authors: Jiahong Fu, Qi Xie, Deyu Meng, Zongben Xu

    Abstract: The deep unfolding approach has attracted significant attention in computer vision tasks, which well connects conventional image processing modeling manners with more recent deep learning techniques. Specifically, by establishing a direct correspondence between algorithm operators at each implementation step and network modules within each layer, one can rationally construct an almost ``white box'… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  39. arXiv:2312.13611  [pdf, other

    cs.LG cs.NI eess.SP

    Topology Learning for Heterogeneous Decentralized Federated Learning over Unreliable D2D Networks

    Authors: Zheshun Wu, Zenglin Xu, Dun Zeng, Junfan Li, Jie Liu

    Abstract: With the proliferation of intelligent mobile devices in wireless device-to-device (D2D) networks, decentralized federated learning (DFL) has attracted significant interest. Compared to centralized federated learning (CFL), DFL mitigates the risk of central server failures due to communication bottlenecks. However, DFL faces several challenges, such as the severe heterogeneity of data distributions… ▽ More

    Submitted 10 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: To appear in IEEE Transactions on Vehicular Technology

  40. arXiv:2312.05528  [pdf, other

    eess.IV cs.CV

    Exploring 3D U-Net Training Configurations and Post-Processing Strategies for the MICCAI 2023 Kidney and Tumor Segmentation Challenge

    Authors: Kwang-Hyun Uhm, Hyunjun Cho, Zhixin Xu, Seohoon Lim, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

    Abstract: In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: MICCAI 2023, KITS 2023 challenge 2nd place

  41. arXiv:2312.03376  [pdf, other

    eess.SY

    Beacon-enabled TDMA Ultraviolet Communication Network System Design and Realization

    Authors: Yuchen Pan, Fei Long, Ping Li, Haotian Shi, Jiazhao Shi, Hanlin Xiao, Chen Gong, Zhengyuan Xu

    Abstract: Nonline of sight (NLOS) ultraviolet (UV) scattering communication can serve as a good candidate for outdoor optical wireless communication (OWC) in the cases of non-perfect transmitter-receiver alignment and radio silence. We design and demonstrate a NLOS UV scattering communication network system in this paper, where a beacon-enabled time division multiple access (TDMA) scheme is adopted. In our… ▽ More

    Submitted 15 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  42. arXiv:2312.01573  [pdf

    eess.IV cs.CV

    Survey on deep learning in multimodal medical imaging for cancer detection

    Authors: Yan Tian, Zhaocheng Xu, Yujun Ma, Weiping Ding, Ruili Wang, Zhihong Gao, Guohua Cheng, Linyang He, Xuran Zhao

    Abstract: The task of multimodal cancer detection is to determine the locations and categories of lesions by using different imaging techniques, which is one of the key research methods for cancer diagnosis. Recently, deep learning-based object detection has made significant developments due to its strength in semantic feature extraction and nonlinear function fitting. However, multimodal cancer detection r… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Journal ref: Neural Computing and Applications. 2023 Nov 29:1-6

  43. arXiv:2311.18188  [pdf, other

    eess.AS cs.LG

    Speech Understanding on Tiny Devices with A Learning Cache

    Authors: Afsara Benazir, Zhiming Xu, Felix Xiaozhu Lin

    Abstract: This paper addresses spoken language understanding (SLU) on microcontroller-like embedded devices, integrating on-device execution with cloud offloading in a novel fashion. We leverage temporal locality in the speech inputs to a device and reuse recent SLU inferences accordingly. Our idea is simple: let the device match incoming inputs against cached results, and only offload inputs not matched to… ▽ More

    Submitted 8 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: accepted at MobiSys'24

  44. arXiv:2311.16572   

    eess.SY physics.ao-ph physics.soc-ph

    Adapting to climate change: Long-term impact of wind resource changes on China's power system resilience

    Authors: Jiaqi Ruan, Xiangrui Meng, Yifan Zhu, Gaoqi Liang, Xianzhuo Sun, Huayi Wu, Huijuan Xiao, Mengqian Lu, Pin Gao, Jiapeng Li, Wai-Kin Wong, Zhao Xu, Junhua Zhao

    Abstract: Modern society's reliance on power systems is at risk from the escalating effects of wind-related climate change. Yet, failure to identify the intricate relationship between wind-related climate risks and power systems could lead to serious short- and long-term issues, including partial or complete blackouts. Here, we develop a comprehensive framework to assess China's power system resilience acro… ▽ More

    Submitted 24 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Not suitable for publication

  45. arXiv:2311.14925  [pdf, other

    cs.CV eess.IV

    Coordinate-based Neural Network for Fourier Phase Retrieval

    Authors: Tingyou Li, Zixin Xu, Yong S. Chu, Xiaojing Huang, Jizhou Li

    Abstract: Fourier phase retrieval is essential for high-definition imaging of nanoscale structures across diverse fields, notably coherent diffraction imaging. This study presents the Single impliCit neurAl Network (SCAN), a tool built upon coordinate neural networks meticulously designed for enhanced phase retrieval performance. Remedying the drawbacks of conventional iterative methods which are easiliy tr… ▽ More

    Submitted 8 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  46. arXiv:2311.13361  [pdf, other

    cs.AI cs.HC eess.SY

    Applying Large Language Models to Power Systems: Potential Security Threats

    Authors: Jiaqi Ruan, Gaoqi Liang, Huan Zhao, Guolong Liu, Xianzhuo Sun, Jing Qiu, Zhao Xu, Fushuan Wen, Zhao Yang Dong

    Abstract: Applying large language models (LLMs) to modern power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this article analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and d… ▽ More

    Submitted 24 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

  47. arXiv:2311.04234  [pdf

    eess.SP cs.CV cs.LG

    Leveraging sinusoidal representation networks to predict fMRI signals from EEG

    Authors: Yamin Li, Ange Lou, Ziyuan Xu, Shiyu Wang, Catie Chang

    Abstract: In modern neuroscience, functional magnetic resonance imaging (fMRI) has been a crucial and irreplaceable tool that provides a non-invasive window into the dynamics of whole-brain activity. Nevertheless, fMRI is limited by hemodynamic blurring as well as high cost, immobility, and incompatibility with metal implants. Electroencephalography (EEG) is complementary to fMRI and can directly record the… ▽ More

    Submitted 24 January, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

  48. arXiv:2311.03653  [pdf, ps, other

    cs.IT eess.SP

    On the Performance of LoRa Empowered Communication for Wireless Body Area Networks

    Authors: Minling Zhang, Guofa Cai, Zhiping Xu, Jiguang He, Markku Juntti

    Abstract: To remotely monitor the physiological status of the human body, long range (LoRa) communication has been considered as an eminently suitable candidate for wireless body area networks (WBANs). Typically, a Rayleigh-lognormal fading channel is encountered by the LoRa links of the WBAN. In this context, we characterize the performance of the LoRa system in WBAN scenarios with an emphasis on the physi… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  49. arXiv:2311.00483  [pdf, other

    eess.IV cs.CV

    DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object Segmentation

    Authors: Xiaohua Jiang, Yihao Guo, Jian Huang, Yuting Wu, Meiyi Luo, Zhaoyang Xu, Qianni Zhang, Xingru Huang, Hong He, Shaowei Jiang, Jing Ye, Mang Xiao

    Abstract: The precise spatial and quantitative delineation of indistinct-boundary medical objects is paramount for the accuracy of diagnostic protocols, efficacy of surgical interventions, and reliability of postoperative assessments. Despite their significance, the effective segmentation and instantaneous three-dimensional reconstruction are significantly impeded by the paucity of representative samples in… ▽ More

    Submitted 19 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 36pages,16figures,7tables

    MSC Class: 68; 92 ACM Class: I.4; J.3

  50. arXiv:2310.14172  [pdf, other

    eess.IV cs.CV

    ASC: Appearance and Structure Consistency for Unsupervised Domain Adaptation in Fetal Brain MRI Segmentation

    Authors: Zihang Xu, Haifan Gong, Xiang Wan, Haofeng Li

    Abstract: Automatic tissue segmentation of fetal brain images is essential for the quantitative analysis of prenatal neurodevelopment. However, producing voxel-level annotations of fetal brain imaging is time-consuming and expensive. To reduce labeling costs, we propose a practical unsupervised domain adaptation (UDA) setting that adapts the segmentation labels of high-quality fetal brain atlases to unlabel… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: MICCAI 2023, released code: https://github.com/lhaof/ASC