Zum Hauptinhalt springen

Showing 1–50 of 366 results for author: Li, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.13945  [pdf, other

    eess.IV cs.CV physics.med-ph

    Personalized Topology-Informed 12-Lead ECG Electrode Localization from Incomplete Cardiac MRIs for Efficient Cardiac Digital Twins

    Authors: Lei Li, Hannah Smith, Yilin Lyu, Julia Camps, Blanca Rodriguez, Abhirup Banerjee, Vicente Grau

    Abstract: Cardiac digital twins (CDTs) offer personalized \textit{in-silico} cardiac representations for the inference of multi-scale properties tied to cardiac mechanisms. The creation of CDTs requires precise information about the electrode position on the torso, especially for the personalized electrocardiogram (ECG) calibration. However, current studies commonly rely on additional acquisition of torso i… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 12 pages

  2. arXiv:2408.13832  [pdf, other

    eess.IV cs.CV

    A Low-dose CT Reconstruction Network Based on TV-regularized OSEM Algorithm

    Authors: Ran An, Yinghui Zhang, Xi Chen, Lemeng Li, Ke Chen, Hongwei Li

    Abstract: Low-dose computed tomography (LDCT) offers significant advantages in reducing the potential harm to human bodies. However, reducing the X-ray dose in CT scanning often leads to severe noise and artifacts in the reconstructed images, which might adversely affect diagnosis. By utilizing the expectation maximization (EM) algorithm, statistical priors could be combined with artificial priors to improv… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 11 pages, 8 figures

    ACM Class: I.4.5

  3. arXiv:2408.13487  [pdf, ps, other

    cs.LO eess.SY math.OC

    Towards Automatic Linearization via SMT Solving

    Authors: Jian Cao, Liyong Lin, Lele Li

    Abstract: Mathematical optimization is ubiquitous in modern applications. However, in practice, we often need to use nonlinear optimization models, for which the existing optimization tools such as Cplex or Gurobi may not be directly applicable and an (error-prone) manual transformation often has to be done. Thus, to address this issue, in this paper we investigate the problem of automatically verifying and… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 4 pages, conference

  4. arXiv:2408.05112  [pdf, other

    cs.LG cs.AI eess.IV

    Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework

    Authors: Kexin Zhang, Lixin Li, Wensheng Lin, Yuna Yan, Rui Li, Wenchi Cheng, Zhu Han

    Abstract: Semantic Communication (SC) is an emerging technology aiming to surpass the Shannon limit. Traditional SC strategies often minimize signal distortion between the original and reconstructed data, neglecting perceptual quality, especially in low Signal-to-Noise Ratio (SNR) environments. To address this issue, we introduce a novel Generative AI Semantic Communication (GSC) system for single-user scen… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  5. arXiv:2408.04158  [pdf, other

    eess.IV cs.CV

    Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

    Authors: Xiaole Zhao, Linze Li, Chengxing Xie, Xiaoming Zhang, Ting Jiang, Wenjie Lin, Shuaicheng Liu, Tianrui Li

    Abstract: Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient S… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024

  6. arXiv:2408.01696  [pdf, other

    cs.SD cs.AI eess.AS

    Generating High-quality Symbolic Music Using Fine-grained Discriminators

    Authors: Zhedong Zhang, Liang Li, Jiehua Zhang, Zhenghui Hu, Hongkui Wang, Chenggang Yan, Jian Yang, Yuankai Qi

    Abstract: Existing symbolic music generation methods usually utilize discriminator to improve the quality of generated music via global perception of music. However, considering the complexity of information in music, such as rhythm and melody, a single discriminator cannot fully reflect the differences in these two primary dimensions of music. In this work, we propose to decouple the melody and rhythm from… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted by ICPR2024

  7. arXiv:2408.01672  [pdf, ps, other

    eess.SP cs.AI

    radarODE: An ODE-Embedded Deep Learning Model for Contactless ECG Reconstruction from Millimeter-Wave Radar

    Authors: Yuanyuan Zhang, Runwei Guan, Lingxiao Li, Rui Yang, Yutao Yue, Eng Gee Lim

    Abstract: Radar-based contactless cardiac monitoring has become a popular research direction recently, but the fine-grained electrocardiogram (ECG) signal is still hard to reconstruct from millimeter-wave radar signal. The key obstacle is to decouple the cardiac activities in the electrical domain (i.e., ECG) from that in the mechanical domain (i.e., heartbeat), and most existing research only uses pure dat… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  8. arXiv:2407.21507  [pdf, other

    cs.AI cs.LG eess.IV

    FSSC: Federated Learning of Transformer Neural Networks for Semantic Image Communication

    Authors: Yuna Yan, Xin Zhang, Lixin Li, Wensheng Lin, Rui Li, Wenchi Cheng, Zhu Han

    Abstract: In this paper, we address the problem of image semantic communication in a multi-user deployment scenario and propose a federated learning (FL) strategy for a Swin Transformer-based semantic communication system (FSSC). Firstly, we demonstrate that the adoption of a Swin Transformer for joint source-channel coding (JSCC) effectively extracts semantic information in the communication system. Next,… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  9. arXiv:2407.19402  [pdf, other

    cs.CV eess.IV

    NVC-1B: A Large Neural Video Coding Model

    Authors: Xihua Sheng, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: The emerging large models have achieved notable progress in the fields of natural language processing and computer vision. However, large models for neural video coding are still unexplored. In this paper, we try to explore how to build a large neural video coding model. Based on a small baseline model, we gradually scale up the model sizes of its different coding parts, including the motion encod… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  10. Large Kernel Distillation Network for Efficient Single Image Super-Resolution

    Authors: Chengxing Xie, Xiaoming Zhang, Linze Li, Haiteng Meng, Tianlin Zhang, Tianrui Li, Xiaole Zhao

    Abstract: Efficient and lightweight single-image super-resolution (SISR) has achieved remarkable performance in recent years. One effective approach is the use of large kernel designs, which have been shown to improve the performance of SISR models while reducing their computational requirements. However, current state-of-the-art (SOTA) models still face problems such as high computational costs. To address… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted to CVPR workshop 2023

  11. arXiv:2407.13076  [pdf, other

    cs.MA cs.NI eess.SP

    Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa Networks

    Authors: Ziqi Lin, Xu Zhang, Shimin Gong, Lanhua Li, Zhou Su, Bo Gu

    Abstract: Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things. Nevertheless, the LoRa MAC layer adopts pure ALOHA for medium access control, which may suffer from severe packet collisions as the network scale expands, consequently reducing the system energy efficiency (EE). To address this issue, it is… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  12. arXiv:2407.11541  [pdf, other

    eess.IV cs.CV

    Uniformly Accelerated Motion Model for Inter Prediction

    Authors: Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear… ▽ More

    Submitted 21 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  13. arXiv:2407.10926  [pdf, other

    eess.IV cs.CV

    In-Loop Filtering via Trained Look-Up Tables

    Authors: Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

    Abstract: In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures

  14. arXiv:2407.09966  [pdf, other

    cs.CV eess.IV

    Optimizing ROI Benefits Vehicle ReID in ITS

    Authors: Mei Qiu, Lauren Ann Christopher, Lingxi Li, Stanley Chien, Yaobin Chen

    Abstract: Vehicle re-identification (ReID) is a computer vision task that matches the same vehicle across different cameras or viewpoints in a surveillance system. This is crucial for Intelligent Transportation Systems (ITS), where the effectiveness is influenced by the regions from which vehicle images are cropped. This study explores whether optimal vehicle detection regions, guided by detection confidenc… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  15. arXiv:2407.08951  [pdf, other

    cs.SD eess.AS

    Audio Spotforming Using Nonnegative Tensor Factorization with Attractor-Based Regularization

    Authors: Shoma Ayano, Li Li, Shogo Seki, Daichi Kitamura

    Abstract: Spotforming is a target-speaker extraction technique that uses multiple microphone arrays. This method applies beamforming (BF) to each microphone array, and the common components among the BF outputs are estimated as the target source. This study proposes a new common component extraction method based on nonnegative tensor factorization (NTF) for higher model interpretability and more robust spot… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted at EUSIPCO2024

  16. arXiv:2407.06705  [pdf, other

    cs.NI eess.SP

    Integrated Sensing and Communications for Resource Allocation in Non-Terrestrial Networks

    Authors: Israel Leyva-Mayorga, Fabio Saggese, Lintao Li, Petar Popovski

    Abstract: The integration of Non-Terrestrial Networks (NTNs) with Low Earth Orbit (LEO) satellite constellations into 5G and Beyond is essential to achieve truly global connectivity. A distinctive characteristic of LEO mega-constellations is that they constitute a global infrastructure with predictable dynamics, which enables the pre-planned allocation of the radio resources. However, the different bands th… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Submitted for publication to IEEE Transactions on Wireless Communications

  17. arXiv:2407.03966  [pdf, other

    cs.SD cs.AI eess.AS

    Serialized Output Training by Learned Dominance

    Authors: Ying Shi, Lantian Li, Shi Yin, Dong Wang, Jiqing Han

    Abstract: Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied on either the Permutation Invariant Training (PIT) or the time-based First-In-First-Out (FIFO) rule. This study presents a model-based serialization st… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: accepted by INTERSPEECH 2024

  18. arXiv:2407.00280  [pdf, other

    eess.IV cs.CV

    IVCA: Inter-Relation-Aware Video Complexity Analyzer

    Authors: Junqi Liao, Yao Li, Zhuoyuan Li, Li Li, Dong Liu

    Abstract: To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: The report for the solution of second prize winner in ICIP 2024 Grand Challenge on Video Complexity (Team: USTC-iVC_Team1, USTC-iVC_Team2)

  19. arXiv:2406.14118  [pdf, other

    eess.IV cs.CV

    Prediction and Reference Quality Adaptation for Learned Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Temporal prediction is one of the most important technologies for video compression. Various prediction coding modes are designed in traditional video codecs. Traditional video codecs will adaptively to decide the optimal coding mode according to the prediction quality and reference quality. Recently, learned video codecs have made great progress. However, they ignore the prediction and reference… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  20. arXiv:2406.14067  [pdf

    physics.optics eess.SP

    A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

    Authors: Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding, Shiyang Liu, Fangyi Yang, Chi Jiang, Yang Chen

    Abstract: In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 12 figures, 1 table

  21. arXiv:2406.13982  [pdf, other

    cs.SD eess.AS

    Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio

    Authors: Li Li, Shogo Seki

    Abstract: RemixIT and Remixed2Remixed are domain adaptation-based speech enhancement (DASE) methods that use a teacher model trained in full supervision to generate pseudo-paired data by remixing the outputs of the teacher model. The student model for enhancing real-world recorded signals is trained using the pseudo-paired data without ground truth. Since the noisy signals are recorded in natural environmen… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  22. arXiv:2406.12650  [pdf, other

    eess.IV

    Weakly Supervised Learning of Cortical Surface Reconstruction from Segmentations

    Authors: Qiang Ma, Liu Li, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

    Abstract: Existing learning-based cortical surface reconstruction approaches heavily rely on the supervision of pseudo ground truth (pGT) cortical surfaces for training. Such pGT surfaces are generated by traditional neuroimage processing pipelines, which are time consuming and difficult to generalize well to low-resolution brain MRI, e.g., from fetuses and neonates. In this work, we present CoSeg, a learni… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024)

  23. arXiv:2406.10956  [pdf, other

    cs.SD cs.LG eess.AS

    Robust Channel Learning for Large-Scale Radio Speaker Verification

    Authors: Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li, Xugang Lu

    Abstract: Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learnin… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages, 11 figures

  24. arXiv:2406.08203  [pdf, other

    eess.AS cs.SD

    LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

    Authors: Wenhao Guan, Kaidi Wang, Wangjin Zhou, Yang Wang, Feng Deng, Hui Wang, Lin Li, Qingyang Hong, Yong Qin

    Abstract: Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of the method is accompanied by the extensive number of sampling steps, leading to an extended synthesis time necessary for generating high-quality audio. Previous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  25. arXiv:2406.07854  [pdf, other

    cs.SD cs.MM eess.AS

    Zero-Shot Fake Video Detection by Audio-Visual Consistency

    Authors: Xiaolou Li, Zehua Liu, Chen Chen, Lantian Li, Li Guo, Dong Wang

    Abstract: Recent studies have advocated the detection of fake videos as a one-class detection task, predicated on the hypothesis that the consistency between audio and visual modalities of genuine data is more significant than that of fake data. This methodology, which solely relies on genuine audio-visual data while negating the need for forged counterparts, is thus delineated as a `zero-shot' detection pa… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  26. arXiv:2406.07832  [pdf, other

    cs.SD eess.AS

    SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition

    Authors: Tianhao Wang, Lantian Li, Dong Wang

    Abstract: Deploying a well-optimized pre-trained speaker recognition model in a new domain often leads to a significant decline in performance. While fine-tuning is a commonly employed solution, it demands ample adaptation data and suffers from parameter inefficiency, rendering it impractical for real-world applications with limited data available for model adaptation. Drawing inspiration from the success o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  27. arXiv:2406.07421  [pdf, other

    cs.SD eess.AS

    A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

    Authors: Zhenyu Zhou, Shibiao Xu, Shi Yin, Lantian Li, Dong Wang

    Abstract: Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this stu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  28. arXiv:2405.16765  [pdf, ps, other

    cs.LG eess.SP

    Study of Robust Direction Finding Based on Joint Sparse Representation

    Authors: Y. Li, W. Xiao, L. Zhao, Z. Huang, Q. Li, L. Li, R. C. de Lamare

    Abstract: Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  29. arXiv:2405.16047  [pdf, ps, other

    cs.IT eess.SP

    Unified Timing Analysis for Closed-Loop Goal-Oriented Wireless Communication

    Authors: Lintao Li, Anders E. Kalør, Petar Popovski, Wei Chen

    Abstract: Goal-oriented communication has become one of the focal concepts in sixth-generation communication systems owing to its potential to provide intelligent, immersive, and real-time mobile services. The emerging paradigms of goal-oriented communication constitute closed loops integrating communication, computation, and sensing. However, challenges arise for closed-loop timing analysis due to multiple… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: submitted to IEEE Trans. Wireless Commun

  30. arXiv:2405.08783  [pdf, other

    eess.IV

    The Developing Human Connectome Project: A Fast Deep Learning-based Pipeline for Neonatal Cortical Surface Reconstruction

    Authors: Qiang Ma, Kaili Liang, Liu Li, Saga Masui, Yourong Guo, Chiara Nosarti, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

    Abstract: The Developing Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 hours to proc… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  31. arXiv:2405.07218  [pdf, other

    physics.med-ph eess.SY

    Chained Flexible Capsule Endoscope: Unraveling the Conundrum of Size Limitations and Functional Integration for Gastrointestinal Transitivity

    Authors: Sishen Yuan, Guang Li, Baijia Liang, Lailu Li, Qingzhuo Zheng, Shuang Song, Zhen Li, Hongliang Ren

    Abstract: Capsule endoscopes, predominantly serving diagnostic functions, provide lucid internal imagery but are devoid of surgical or therapeutic capabilities. Consequently, despite lesion detection, physicians frequently resort to traditional endoscopic or open surgical procedures for treatment, resulting in more complex, potentially risky interventions. To surmount these limitations, this study introduce… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  32. arXiv:2405.04167  [pdf, other

    cs.CV eess.IV

    Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

    Authors: Aobo Li, Jinjian Wu, Yongxu Liu, Leida Li

    Abstract: The annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming, especially for authentic images. Training on synthetic data is expected to be beneficial, but synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that introducing more distortion types in the synthetic dataset may… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  33. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  34. arXiv:2405.00885  [pdf, other

    cs.LG cs.NI eess.IV

    WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

    Authors: Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Hao Wang, Xin Fu, Miao Pan

    Abstract: As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training b… ▽ More

    Submitted 19 August, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  35. arXiv:2404.16289  [pdf, other

    cs.IT eess.SP

    Deep Joint CSI Feedback and Multiuser Precoding for MIMO OFDM Systems

    Authors: Yiran Guo, Wei Chen, Jialong Xu, Lun Li, Bo Ai

    Abstract: The design of precoding plays a crucial role in achieving a high downlink sum-rate in multiuser multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems. In this correspondence, we propose a deep learning based joint CSI feedback and multiuser precoding method in frequency division duplex systems, aiming at maximizing the downlink sum-rate performance in an e… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  36. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  37. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  38. arXiv:2404.08010  [pdf, other

    cs.LG eess.IV

    Differentiable Search for Finding Optimal Quantization Strategy

    Authors: Lianqiang Li, Chenqian Yan, Yefei Chen

    Abstract: To accelerate and compress deep neural networks (DNNs), many network quantization algorithms have been proposed. Although the quantization strategy of any algorithm from the state-of-the-arts may outperform others in some network architectures, it is hard to prove the strategy is always better than others, and even cannot judge that the strategy is always the best choice for all layers in a networ… ▽ More

    Submitted 15 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  39. arXiv:2404.06682  [pdf, other

    cs.SD eess.AS

    Learning Multidimensional Disentangled Representations of Instrumental Sounds for Musical Similarity Assessment

    Authors: Yuka Hashizume, Li Li, Atsushi Miyashita, Tomoki Toda

    Abstract: To achieve a flexible recommendation and retrieval system, it is desirable to calculate music similarity by focusing on multiple partial elements of musical pieces and allowing the users to select the element they want to focus on. A previous study proposed using multiple individual networks for calculating music similarity based on each instrumental sound, but it is impractical to use each signal… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  40. DiffDet4SAR: Diffusion-based Aircraft Target Detection Network for SAR Images

    Authors: Zhou Jie, Xiao Chao, Peng Bo, Liu Zhen, Liu Li, Liu Yongxiang, Li Xiang

    Abstract: Aircraft target detection in SAR images is a challenging task due to the discrete scattering points and severe background clutter interference. Currently, methods with convolution-based or transformer-based paradigms cannot adequately address these issues. In this letter, we explore diffusion models for SAR image aircraft target detection for the first time and propose a novel \underline{Diff}usio… ▽ More

    Submitted 17 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: accepted by IEEE GRSL

    Journal ref: IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1-5, 2024, Art no. 4007905

  41. arXiv:2403.11694  [pdf, other

    eess.IV cs.CV

    Object Segmentation-Assisted Inter Prediction for Versatile Video Coding

    Authors: Zhuoyuan Li, Zikun Yuan, Li Li, Dong Liu, Xiaohu Tang, Feng Wu

    Abstract: In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to compactly represent. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VV… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 22 pages, 15 figures

  42. arXiv:2403.10786  [pdf, other

    eess.IV cs.CV cs.LG

    ContourDiff: Unpaired Image Translation with Contour-Guided Diffusion Models

    Authors: Yuwen Chen, Nicholas Konz, Hanxue Gu, Haoyu Dong, Yaqian Chen, Lin Li, Jisoo Lee, Maciej A. Mazurowski

    Abstract: Accurately translating medical images across different modalities (e.g., CT to MRI) has numerous downstream clinical and machine learning applications. While several methods have been proposed to achieve this, they often prioritize perceptual quality with respect to output domain features over preserving anatomical fidelity. However, maintaining anatomy during translation is essential for many tas… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Code will be released on GitHub

  43. arXiv:2403.10581  [pdf, other

    q-bio.QM cs.AI cs.CL cs.LG eess.SP

    Large Language Model-informed ECG Dual Attention Network for Heart Failure Risk Prediction

    Authors: Chen Chen, Lei Li, Marcel Beetz, Abhirup Banerjee, Ramneek Gupta, Vicente Grau

    Abstract: Heart failure (HF) poses a significant public health challenge, with a rising global mortality rate. Early detection and prevention of HF could significantly reduce its impact. We introduce a novel methodology for predicting HF risk using 12-lead electrocardiograms (ECGs). We present a novel, lightweight dual-attention ECG network designed to capture complex ECG features essential for early HF ris… ▽ More

    Submitted 22 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Under journal revision

  44. arXiv:2403.05937  [pdf, other

    cs.CV eess.IV

    Wavelet-Like Transform-Based Technology in Response to the Call for Proposals on Neural Network-Based Image Coding

    Authors: Cunhui Dong, Haichuan Ma, Haotian Zhang, Changsheng Gao, Li Li, Dong Liu

    Abstract: Neural network-based image coding has been developing rapidly since its birth. Until 2022, its performance has surpassed that of the best-performing traditional image coding framework -- H.266/VVC. Witnessing such success, the IEEE 1857.11 working subgroup initializes a neural network-based image coding standard project and issues a corresponding call for proposals (CfP). In response to the CfP, t… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  45. arXiv:2403.04626  [pdf, other

    eess.IV cs.CL cs.CV cs.LG

    MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

    Authors: Lei Li, Tianfang Zhang, Xinglin Zhang, Jiaqi Liu, Bingqi Ma, Yan Luo, Tao Chen

    Abstract: Within the domain of medical analysis, extensive research has explored the potential of mutual learning between Masked Autoencoders(MAEs) and multimodal data. However, the impact of MAEs on intermodality remains a key challenge. We introduce MedFLIP, a Fast Language-Image Pre-training method for Medical analysis. We explore MAEs for zero-shot learning with crossed domains, which enhances the model… ▽ More

    Submitted 30 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  46. arXiv:2402.17550  [pdf, other

    cs.NI cs.AI eess.SP

    Emergency Caching: Coded Caching-based Reliable Map Transmission in Emergency Networks

    Authors: Zeyu Tian, Lianming Xu, Liang Li, Li Wang, Aiguo Fei

    Abstract: Many rescue missions demand effective perception and real-time decision making, which highly rely on effective data collection and processing. In this study, we propose a three-layer architecture of emergency caching networks focusing on data collection and reliable transmission, by leveraging efficient perception and edge caching technologies. Based on this architecture, we propose a disaster map… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  47. arXiv:2402.14401  [pdf, other

    cs.CV cs.LG eess.IV

    Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment

    Authors: Zhaoyang Wang, Bo Hu, Mingyang Zhang, Jie Li, Leida Li, Maoguo Gong, Xinbo Gao

    Abstract: Existing free-energy guided No-Reference Image Quality Assessment (NR-IQA) methods still suffer from finding a balance between learning feature information at the pixel level of the image and capturing high-level feature information and the efficient utilization of the obtained high-level feature information remains a challenge. As a novel class of state-of-the-art (SOTA) generative model, the dif… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  48. arXiv:2402.09463  [pdf

    eess.IV

    Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results

    Authors: Kelly Payette, Céline Steger, Roxane Licandro, Priscille de Dumast, Hongwei Bran Li, Matthew Barkovich, Liu Li, Maik Dannecker, Chen Chen, Cheng Ouyang, Niccolò McConnell, Alina Miron, Yongmin Li, Alena Uus, Irina Grigorescu, Paula Ramirez Gilliland, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Haoyu Wang, Ziyan Huang, Jin Ye, Mireia Alenyà, Valentin Comte, Oscar Camara , et al. (42 additional authors not shown)

    Abstract: Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Results from FeTA Challenge 2022, held at MICCAI; Manuscript submitted. Supplementary Info (including submission methods descriptions) available here: https://zenodo.org/records/10628648

  49. arXiv:2402.05079  [pdf, other

    eess.IV cs.CV

    Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

    Authors: Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, Lei Li

    Abstract: In recent advancements in medical image analysis, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have set significant benchmarks. While the former excels in capturing local features through its convolution operations, the latter achieves remarkable global context understanding by leveraging self-attention mechanisms. However, both architectures exhibit limitations in efficiently… ▽ More

    Submitted 30 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  50. arXiv:2402.04157  [pdf, other

    eess.SY math.DS math.OC

    Controller synthesis for input-state data with measurement errors

    Authors: Andrea Bisoffi, Lidong Li, Claudio De Persis, Nima Monshizadeh

    Abstract: We consider the problem of designing a state-feedback controller for a linear system, based only on noisy input-state data. We focus on input-state data corrupted by measurement errors, which, albeit less investigated, are as relevant as process disturbances in applications. For energy and instantaneous bounds on these measurement errors, we derive linear matrix inequalities for controller design… ▽ More

    Submitted 14 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.