Zum Hauptinhalt springen

Showing 1–50 of 67 results for author: He, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.00413  [pdf, other

    cs.IT eess.SP

    Joint Antenna Position and Beamforming Optimization with Self-Interference Mitigation in MA-ISAC System

    Authors: Size Peng, Cixiao Zhang, Yin Xu, Qingqing Wu, Xiaowu Ou, Dazhi He

    Abstract: Movable antennas (MAs) have demonstrated significant potential in enhancing the performance of integrated sensing and communication (ISAC) systems. However, the application in the integrated and cost-effective full-duplex (FD) monostatic systems remains underexplored. To address this research gap, we develop an MA-ISAC model within a monostatic framework, where the self-interference channel is mod… ▽ More

    Submitted 9 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

  2. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  3. arXiv:2407.11651  [pdf, other

    cs.IT eess.SP

    Fluid Antenna Grouping Index Modulation Design for MIMO Systems

    Authors: Xinghao Guo, Yin Xu, Dazhi He, Cixiao Zhang, Wenjun Zhang, Yi-yan Wu

    Abstract: Index modulation (IM) significantly enhances the spectral efficiency of fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) systems, which is named FA-IM. However, due to the dense distribution of ports on the FA, the wireless channel exhibits a high spatial correlation, leading to severe performance degradation in the existing FA-IM-assisted MIMO systems. To tackle this issue, this… ▽ More

    Submitted 16 August, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: A longer version with more details will be submitted to an IEEE journal

  4. arXiv:2407.00743  [pdf, other

    cs.MM cs.AI cs.CL eess.AS

    AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations

    Authors: Sheng Wu, Jiaxing Liu, Longbiao Wang, Dongxiao He, Xiaobao Wang, Jianwu Dang

    Abstract: Emotion Recognition in Conversations (ERC) is a popular task in natural language processing, which aims to recognize the emotional state of the speaker in conversations. While current research primarily emphasizes contextual modeling, there exists a dearth of investigation into effective multimodal fusion methods. We propose a novel framework called AIMDiT to solve the problem of multimodal fusion… ▽ More

    Submitted 12 April, 2024; originally announced July 2024.

  5. arXiv:2406.14064  [pdf, other

    cs.IT eess.SP

    PAPR Reduction with Pre-chirp Selection for Affine Frequency Division Multiplexing

    Authors: Haozhi Yuan, Yin Xu, Xinghao Guo, Yao Ge, Tianyao Ma, Haoyang Li, Dazhi He, Wenjun Zhang

    Abstract: Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique for high-mobility communications based on discrete affine Fourier transform (DAFT). By properly tuning the pre-chirp parameter and the post-chirp parameter in the DAFT, the effective channel in the DAFT domain can completely circumvent path overlap, thereby constituting a full representation of delay-Doppler pr… ▽ More

    Submitted 25 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2404.04848  [pdf, other

    eess.IV cs.AI cs.CV

    Task-Aware Encoder Control for Deep Video Compression

    Authors: Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin

    Abstract: Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an… ▽ More

    Submitted 20 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  7. arXiv:2403.08551  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

    Authors: Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang

    Abstract: Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation an… ▽ More

    Submitted 9 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV 2024. Project Page:https://xingtongge.github.io/GaussianImage-page/ Code: https://github.com/Xinjie-Q/GaussianImage

  8. arXiv:2403.08505  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    Content-aware Masked Image Modeling Transformer for Stereo Image Compression

    Authors: Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang

    Abstract: Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compressi… ▽ More

    Submitted 19 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  9. arXiv:2402.18152  [pdf, other

    eess.IV cs.AI cs.CV

    Boosting Neural Representations for Videos with a Conditional Decoder

    Authors: Xinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang, Hongwei Qin, Jun Zhang

    Abstract: Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting frame… ▽ More

    Submitted 16 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accept by CVPR 2024

  10. arXiv:2401.14717  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

    Authors: Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

    Abstract: We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: To appear in IEEE ICASSP 2024

  11. arXiv:2401.08920  [pdf, other

    eess.IV cs.CV

    Idempotence and Perceptual Image Compression

    Authors: Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

    Abstract: Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we prop… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  12. Two-pass Endpoint Detection for Speech Recognition

    Authors: Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

    Abstract: Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands. The endpoint detector has to trade-off between accuracy and latency, since waiting longer reduces the cases of users being cut-off early. We propose a novel two-pass solution for endpointing, where the utterance endpoint detected from a first pass endpointer is verified b… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ASRU 2023

  13. arXiv:2311.04769  [pdf

    eess.IV cs.CV

    An attention-based deep learning network for predicting Platinum resistance in ovarian cancer

    Authors: Haoming Zhuang, Beibei Li, Jingtong Ma, Patrice Monkam, Shouliang Qi, Wei Qian, Dianning He

    Abstract: Background: Ovarian cancer is among the three most frequent gynecologic cancers globally. High-grade serous ovarian cancer (HGSOC) is the most common and aggressive histological type. Guided treatment for HGSOC typically involves platinum-based combination chemotherapy, necessitating an assessment of whether the patient is platinum-resistant. The purpose of this study is to propose a deep learning… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  14. arXiv:2310.17425  [pdf, ps, other

    eess.SP cs.IT

    Detecting Abrupt Change of Channel Covariance Matrix in IRS-Assisted Communication

    Authors: Runnan Liu, Liang Liu, Yin Xu, Dazhi He, Wenjun Zhang, Chang Wen Chen

    Abstract: The knowledge of channel covariance matrices is crucial to the design of intelligent reflecting surface (IRS) assisted communication. However, channel covariance matrices may change suddenly in practice. This letter focuses on the detection of the above change in IRS-assisted communication. Specifically, we consider the uplink communication system consisting of a single-antenna user (UE), an IRS,… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: accepted by IEEE Wireless Communications Letters

  15. arXiv:2310.15566  [pdf, other

    cs.IT eess.SP

    RIS-Aided Receive Generalized Spatial Modulation Design with Reflecting Modulation

    Authors: Xinghao Guo, Yin Xu, Hanjiang Hong, De Mi, Ruiqi Liu, Dazhi He, Wenjun Zhang, Yi-yan Wu

    Abstract: Spatial modulation (SM) transmits additional information bits by the selection of antennas. Generalized spatial modulation (GSM), as an advanced type of SM, can be divided into diversity and multiplexing (MUX) schemes according to the symbols carried on the selected antennas are identical or different. Recently, reconfigurable intelligent surface (RIS) assisted SM exhibits better reception perform… ▽ More

    Submitted 15 April, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: 6 pages, submitted to Conference

  16. arXiv:2310.15565  [pdf, other

    cs.IT eess.SP

    Capacity-based Spatial Modulation Constellation and Pre-scaling Design

    Authors: Xinghao Guo, Hanjiang Hong, Yin Xu, Yi-yan Wu, Dazhi He, Wenjun Zhang

    Abstract: Spatial Modulation (SM) can utilize the index of the transmit antenna (TA) to transmit additional information. In this paper, to improve the performance of SM, a non-uniform constellation (NUC) and pre-scaling coefficients optimization design scheme is proposed. The bit-interleaved coded modulation (BICM) capacity calculation formula of SM system is firstly derived. The constellation and pre-scali… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 6 pages,conference

  17. arXiv:2309.08895  [pdf, other

    cs.IT eess.SP

    CDDM: Channel Denoising Diffusion Models for Wireless Semantic Communications

    Authors: Tong Wu, Zhiyong Chen, Dazhi He, Liang Qian, Yin Xu, Meixia Tao, Wenjun Zhang

    Abstract: Diffusion models (DM) can gradually learn to remove noise, which have been widely used in artificial intelligence generated content (AIGC) in recent years. The property of DM for eliminating noise leads us to wonder whether DM can be applied to wireless communications to help the receiver mitigate the channel noise. To address this, we propose channel denoising diffusion models (CDDM) for semantic… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: submitted to IEEE Transactions on Wireless Communications. arXiv admin note: substantial text overlap with arXiv:2305.09161

  18. arXiv:2308.13287  [pdf, other

    eess.IV

    Efficient Learned Lossless JPEG Recompression

    Authors: Lina Guo, Yuanyuan Wang, Tongda Xu, Jixiang Luo, Dailan He, Zhenjun Ji, Shanshan Wang, Yang Wang, Hongwei Qin

    Abstract: JPEG is one of the most popular image compression methods. It is beneficial to compress those existing JPEG files without introducing additional distortion. In this paper, we propose a deep learning based method to further compress JPEG images losslessly. Specifically, we propose a Multi-Level Parallel Conditional Modeling (ML-PCM) architecture, which enables parallel decoding in different granula… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  19. arXiv:2308.08154  [pdf, other

    eess.IV cs.CV

    Conditional Perceptual Quality Preserving Image Compression

    Authors: Tongda Xu, Qian Zhang, Yanghao Li, Dailan He, Zhe Wang, Yuanyuan Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

    Abstract: We propose conditional perceptual quality, an extension of the perceptual quality defined in \citet{blau2018perception}, by conditioning it on user defined information. Specifically, we extend the original perceptual quality $d(p_{X},p_{\hat{X}})$ to the conditional perceptual quality $d(p_{X|Y},p_{\hat{X}|Y})$, where $X$ is the original image, $\hat{X}$ is the reconstructed, $Y$ is side informati… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  20. arXiv:2305.13794  [pdf, other

    cs.CL eess.AS

    Personalized Predictive ASR for Latency Reduction in Voice Assistants

    Authors: Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow

    Abstract: Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation. Prefetching involves passing a preliminary ASR hypothesis to downstream systems in order to prefetch and cache a response. If the final ASR hypothesis after endpoint detection matches the preliminary one, the cached response can be delivered to the user, th… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted for Interspeech 2023

  21. arXiv:2305.09161  [pdf, other

    cs.IT eess.SP

    CDDM: Channel Denoising Diffusion Models for Wireless Communications

    Authors: Tong Wu, Zhiyong Chen, Dazhi He, Liang Qian, Yin Xu, Meixia Tao, Wenjun Zhang

    Abstract: Diffusion models (DM) can gradually learn to remove noise, which have been widely used in artificial intelligence generated content (AIGC) in recent years. The property of DM for removing noise leads us to wonder whether DM can be applied to wireless communications to help the receiver eliminate the channel noise. To address this, we propose channel denoising diffusion models (CDDM) for wireless c… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  22. arXiv:2305.03704  [pdf, other

    eess.SP

    A 3D Modeling Method for Scattering on Rough Surfaces at the Terahertz Band

    Authors: Ben Chen, Ke Guan, Danping He, Pengxiang Xie, Zhangdui Zhong, Jianwu Dou, Shahid Mumtaz, Wael Bazzi

    Abstract: The terahertz (THz) band (0.1-10 THz) is widely considered to be a candidate band for the sixth-generation mobile communication technology (6G). However, due to its short wavelength (less than 1 mm), scattering becomes a particularly significant propagation mechanism. In previous studies, we proposed a scattering model to characterize the scattering in THz bands, which can only reconstruct the sca… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

  23. Adaptive Endpointing with Deep Contextual Multi-armed Bandits

    Authors: Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

    Abstract: Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Journal ref: Proc. IEEE ICASSP, June 2023

  24. arXiv:2303.04068  [pdf, other

    cs.DB cs.CV cs.SD eess.AS

    VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]

    Authors: Maureen Daum, Enhao Zhang, Dong He, Stephen Mussmann, Brandon Haynes, Ranjay Krishna, Magdalena Balazinska

    Abstract: We introduce VOCALExplore, a system designed to support users in building domain-specific models over video datasets. VOCALExplore supports interactive labeling sessions and trains models using user-supplied labels. VOCALExplore maximizes model quality by automatically deciding how to select samples based on observed skew in the collected labels. It also selects the optimal video representations t… ▽ More

    Submitted 29 September, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

  25. arXiv:2301.11557  [pdf, other

    eess.SP

    A Ray-tracing and Deep Learning Fusion Super-resolution Modeling Method for Wireless Mobile Channel

    Authors: Zhao Zhang, Danping He, Xiping Wang, Ke Guan, Zhangdui Zhong, Jianwu Dou

    Abstract: Mobile channel modeling has always been the core part for design, deployment and optimization of communication system, especially in 5G and beyond era. Deterministic channel modeling could precisely achieve mobile channel description, however with defects of equipment and time consuming. In this paper, we proposed a novel super resolution (SR) model for cluster characteristics prediction. The mode… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: 5 pages,7 figures,accepted by EuCAP2023

  26. arXiv:2301.04479  [pdf, other

    eess.SP

    Super-resolution of Ray-tracing Channel Simulation via Attention Mechanism based Deep Learning Model

    Authors: Haoyang Zhang, Danping He, Xiping Wang, Wenbin Wang, Yunhao Cheng, Ke Guan

    Abstract: As an emerging approach, deep learning plays an increasingly influential role in channel modeling. Traditional ray tracing (RT) methods of channel modeling tend to be inefficient and expensive. In this paper, we present a super-resolution (SR) model for channel characteristics. Residual connection and attention mechanism are applied to this convolutional neural network (CNN) model. Experiments pro… ▽ More

    Submitted 21 January, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  27. arXiv:2209.13055  [pdf, other

    eess.IV cs.CV

    Effective Invertible Arbitrary Image Rescaling

    Authors: Zhihong Pan, Baopu Li, Dongliang He, Wenhao Wu, Errui Ding

    Abstract: Great successes have been achieved using deep learning techniques for image super-resolution (SR) with fixed scales. To increase its real world applicability, numerous models have also been proposed to restore SR images with arbitrary scale factors, including asymmetric ones where images are resized to different scales along horizontal and vertical directions. Though most models are only optimized… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  28. arXiv:2209.09244  [pdf, other

    eess.IV cs.CV cs.LG

    Flexible Neural Image Compression via Code Editing

    Authors: Chenjian Gao, Tongda Xu, Dailan He, Hongwei Qin, Yan Wang

    Abstract: Neural image compression (NIC) has outperformed traditional image codecs in rate-distortion (R-D) performance. However, it usually requires a dedicated encoder-decoder pair for each point on R-D curve, which greatly hinders its practical deployment. While some recent works have enabled bitrate control via conditional coding, they impose strong prior during training and provide limited flexibility.… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022

  29. arXiv:2209.04207  [pdf, other

    eess.SP

    A Multi-Task Learning Model for Super Resolution of Wireless Channel Characteristics

    Authors: Xiping Wang, Zhao Zhang, Danping He, Ke Guan, Dongliang Liu, Jianwu Dou

    Abstract: Channel modeling has always been the core part in communication system design and development, especially in 5G and 6G era. Traditional approaches like stochastic channel modeling and ray-tracing (RT) based channel modeling depend heavily on measurement data or simulation, which are usually expensive and time consuming. In this paper, we propose a novel super resolution (SR) model for generating c… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    Comments: 6 pages, GLOBECOM 2022 CQRM accepted. Thanks haoyang for his help in uploading :)

  30. arXiv:2208.11184  [pdf, other

    eess.IV cs.CV

    AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

    Authors: Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei Li, Jingzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota , et al. (28 additional authors not shown)

    Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3… ▽ More

    Submitted 25 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: Camera-ready version

  31. arXiv:2207.14524  [pdf, other

    eess.IV cs.CV

    Evaluating the Practicality of Learned Image Compression

    Authors: Hongjiu Yu, Qiancheng Sun, Jin Hu, Xingyuan Xue, Jixiang Luo, Dailan He, Yilong Li, Pengbo Wang, Yuanyuan Wang, Yaxu Dai, Yan Wang, Hongwei Qin

    Abstract: Learned image compression has achieved extraordinary rate-distortion performance in PSNR and MS-SSIM compared to traditional methods. However, it suffers from intensive computation, which is intolerable for real-world applications and leads to its limited industrial application for now. In this paper, we introduce neural architecture search (NAS) to designing more efficient networks with lower lat… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

  32. arXiv:2207.01984  [pdf, ps, other

    eess.SP cs.IT

    Detecting Abrupt Changes in Channel Covariance Matrix for MIMO Communication

    Authors: Runnan Liu, Liang Liu, Dazhi He, Wenjun Zhang, Erik G. Larsson

    Abstract: The acquisition of the channel covariance matrix is of paramount importance to many strategies in multiple-input-multiple-output (MIMO) communications, such as the minimum mean-square error (MMSE) channel estimation. Therefore, plenty of efficient channel covariance matrix estimation schemes have been proposed in the literature. However, an abrupt change in the channel covariance matrix may happen… ▽ More

    Submitted 10 March, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: accepted by IEEE TWC

  33. arXiv:2205.14501  [pdf, other

    eess.IV

    PO-ELIC: Perception-Oriented Efficient Learned Image Coding

    Authors: Dailan He, Ziming Yang, Hongjiu Yu, Tongda Xu, Jixiang Luo, Yuan Chen, Chenjian Gao, Xinjie Shi, Hongwei Qin, Yan Wang

    Abstract: In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, w… ▽ More

    Submitted 28 May, 2022; originally announced May 2022.

    Comments: CVPR2022 Workshop, 5-th CLIC Image Compression Track

  34. arXiv:2203.16357  [pdf, other

    eess.IV cs.CV

    Practical Learned Lossless JPEG Recompression with Multi-Level Cross-Channel Entropy Model in the DCT Domain

    Authors: Lina Guo, Xinjie Shi, Dailan He, Yuanyuan Wang, Rui Ma, Hongwei Qin, Yan Wang

    Abstract: JPEG is a popular image compression method widely used by individuals, data center, cloud storage and network filesystems. However, most recent progress on image compression mainly focuses on uncompressed images while ignoring trillions of already-existing JPEG images. To compress these JPEG images adequately and restore them back to JPEG format losslessly when needed, we propose a deep learning b… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  35. arXiv:2203.10886  [pdf, other

    cs.CV eess.IV

    ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding

    Authors: Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, Yan Wang

    Abstract: Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we firs… ▽ More

    Submitted 29 March, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: accepted by CVPR 2022 (oral)

  36. arXiv:2203.00911  [pdf, other

    eess.IV cs.CV

    Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence

    Authors: Zhihong Pan, Baopu Li, Dongliang He, Mingde Yao, Wenhao Wu, Tianwei Lin, Xin Li, Errui Ding

    Abstract: Deep learning based single image super-resolution models have been widely studied and superb results are achieved in upscaling low-resolution images with fixed scale factor and downscaling degradation kernel. To improve real world applicability of such models, there are growing interests to develop models optimized for arbitrary upscaling factors. Our proposed method is the first to treat arbitrar… ▽ More

    Submitted 7 March, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: To appear at CVPR 2022

  37. arXiv:2202.10593  [pdf, other

    eess.AS cs.SD

    VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition

    Authors: Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas

    Abstract: While end-to-end models have shown great success on the Automatic Speech Recognition task, performance degrades severely when target sentences are long-form. The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. For both methods, word error rate (WER) decreases monotonically when overlapping percentage decreases. Setting aside computational… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

  38. arXiv:2202.07513  [pdf, other

    eess.IV cs.CV

    Post-Training Quantization for Cross-Platform Learned Image Compression

    Authors: Dailan He, Ziming Yang, Yuan Chen, Qi Zhang, Hongwei Qin, Yan Wang

    Abstract: It has been witnessed that learned image compression has outperformed conventional image coding techniques and tends to be practical in industrial applications. One of the most critical issues that need to be considered is the non-deterministic calculation, which makes the probability prediction cross-platform inconsistent and frustrates successful decoding. We propose to solve this problem by int… ▽ More

    Submitted 30 November, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

  39. arXiv:2201.04302  [pdf, other

    eess.IV cs.LG

    De-Noising of Photoacoustic Microscopy Images by Deep Learning

    Authors: Da He, Jiasheng Zhou, Xiaoyu Shang, Jiajia Luo, Sung-Liang Chen

    Abstract: As a hybrid imaging technology, photoacoustic microscopy (PAM) imaging suffers from noise due to the maximum permissible exposure of laser intensity, attenuation of ultrasound in the tissue, and the inherent noise of the transducer. De-noising is a post-processing method to reduce noise, and PAM image quality can be recovered. However, previous de-noising techniques usually heavily rely on mathema… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 12 pages, 8 figures

  40. arXiv:2201.02834  [pdf, other

    eess.SP cs.LG

    Reconfigurable Intelligent Surface Enabled Spatial Multiplexing with Fully Convolutional Network

    Authors: Bile Peng, Jan-Aike Termöhlen, Cong Sun, Danping He, Ke Guan, Tim Fingscheidt, Eduard A. Jorswieck

    Abstract: Reconfigurable intelligent surface (RIS) is an emerging technology for future wireless communication systems. In this work, we consider downlink spatial multiplexing enabled by the RIS for weighted sum-rate (WSR) maximization. In the literature, most solutions use alternating gradient-based optimization, which has moderate performance, high complexity, and limited scalability. We propose to apply… ▽ More

    Submitted 21 September, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

  41. arXiv:2109.14863  [pdf, other

    cs.CV eess.IV

    HLIC: Harmonizing Optimization Metrics in Learned Image Compression by Reinforcement Learning

    Authors: Baocheng Sun, Meng Gu, Dailan He, Tongda Xu, Yan Wang, Hongwei Qin

    Abstract: Learned image compression is making good progress in recent years. Peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) are the two most popular evaluation metrics. As different metrics only reflect certain aspects of human perception, works in this field normally optimize two models using PSNR and MS-SSIM as loss function separately, which is suboptimal and makes it d… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

    Comments: working paper

  42. arXiv:2109.04192  [pdf, ps, other

    eess.SP cs.IT

    Detection of Abrupt Change in Channel Covariance Matrix for Multi-Antenna Communication

    Authors: Runnan Liu, Liang Liu, Dazhi He, Wenjun Zhang, Erik G. Larsson

    Abstract: The knowledge of channel covariance matrices is of paramount importance to the estimation of instantaneous channels and the design of beamforming vectors in multi-antenna systems. In practice, an abrupt change in channel covariance matrices may occur due to the change in the environment and the user location. Although several works have proposed efficient algorithms to estimate the channel covaria… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: accepted by Globecom 2021

  43. arXiv:2104.10781  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

    Authors: Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

    Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More

    Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: Corrected the MOS values in Table 2, and corrected some minor typos

  44. arXiv:2104.05376  [pdf, other

    cs.CV eess.IV

    Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

    Authors: Tianwei Lin, Zhuoqi Ma, Fu Li, Dongliang He, Xin Li, Errui Ding, Nannan Wang, Jie Li, Xinbo Gao

    Abstract: Artistic style transfer aims at migrating the style from an example image to a content image. Currently, optimization-based methods have achieved great stylization quality, but expensive time cost restricts their practical applications. Meanwhile, feed-forward methods still fail to synthesize complex style, especially when holistic global and local patterns exist. Inspired by the common painting p… ▽ More

    Submitted 17 April, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Accepted by CVPR 2021. Codes will be released soon on https://github.com/PaddlePaddle/PaddleGAN/

  45. arXiv:2103.15306  [pdf, other

    eess.IV cs.CV

    Checkerboard Context Model for Efficient Learned Image Compression

    Authors: Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, Hongwei Qin

    Abstract: For learned image compression, the autoregressive context model is proved effective in improving the rate-distortion (RD) performance. Because it helps remove spatial redundancies among latent representations. However, the decoding process must be done in a strict scan order, which breaks the parallelization. We propose a parallelizable checkerboard context model (CCM) to solve the problem. Our tw… ▽ More

    Submitted 1 April, 2021; v1 submitted 28 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  46. arXiv:2103.08393  [pdf, other

    eess.AS cs.LG cs.SD

    Wav2vec-C: A Self-supervised Model for Speech Representation Learning

    Authors: Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

    Abstract: Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way similar to Wav2vec 2.0. However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to th… ▽ More

    Submitted 23 June, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: To appear in Interspeech 2021

  47. arXiv:2103.00188  [pdf

    eess.IV cs.CV cs.LG

    Super-resolution-based Change Detection Network with Stacked Attention Module for Images with Different Resolutions

    Authors: Mengxi Liu, Qian Shi, Andrea Marinoni, Da He, Xiaoping Liu, Liangpei Zhang

    Abstract: Change detection, which aims to distinguish surface changes based on bi-temporal images, plays a vital role in ecological protection and urban planning. Since high resolution (HR) images cannot be typically acquired continuously over time, bi-temporal images with different resolutions are often adopted for change detection in practical applications. Traditional subpixel-based methods for change de… ▽ More

    Submitted 27 February, 2021; originally announced March 2021.

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing. 2021

  48. arXiv:2101.05206  [pdf, ps, other

    eess.SP cs.LG

    Deep Learning Assisted Calibrated Beam Training for Millimeter-Wave Communication Systems

    Authors: Ke Ma, Dongxuan He, Hancun Sun, Zhaocheng Wang, Sheng Chen

    Abstract: Huge overhead of beam training imposes a significant challenge in millimeter-wave (mmWave) wireless communications. To address this issue, in this paper, we propose a wide beam based training approach to calibrate the narrow beam direction according to the channel power leakage. To handle the complex nonlinear properties of the channel power leakage, deep learning is utilized to predict the optima… ▽ More

    Submitted 19 July, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

    Comments: Accepted by IEEE Transactions on Communications

  49. arXiv:2011.02332  [pdf, ps, other

    eess.SP

    Deep Learning Assisted mmWave Beam Prediction with Prior Low-frequency Information

    Authors: Ke Ma, Dongxuan He, Hancun Sun, Zhaocheng Wang

    Abstract: Huge overhead of beam training poses a significant challenge to mmWave communications. To address this issue, beam tracking has been widely investigated whereas existing methods are hard to handle serious multipath interference and non-stationary scenarios. Inspired by the spatial similarity between low-frequency and mmWave channels in non-standalone architectures, this paper proposes to utilize p… ▽ More

    Submitted 8 February, 2021; v1 submitted 30 October, 2020; originally announced November 2020.

  50. arXiv:2008.11617  [pdf, ps, other

    cs.DC eess.SP

    A Bilateral Game Approach for Task Outsourcing in Multi-access Edge Computing

    Authors: Zheng Xiao, Dan He, Yu Chen, Anthony Theodore Chronopoulos, Schahram Dustdar, Jiayi Du

    Abstract: Multi-access edge computing (MEC) is a promising architecture to provide low-latency applications for future Internet of Things (IoT)-based network systems. Together with the increasing scholarly attention on task offloading, the problem of edge servers' resource allocation has been widely studied. Most of previous works focus on a single edge server (ES) serving multiple terminal entities (TEs),… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.