Zum Hauptinhalt springen

Showing 1–50 of 113 results for author: Guo, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.07566  [pdf

    eess.SY

    Startup Control Optimization of He-Xe Cooled Space Nuclear Reactors Using a System Analysis Program

    Authors: Chengyuan Li, Leran Guo, Shanfang Huang, Jian Deng, Jiahe Shang

    Abstract: In recent years, achieving autonomous control in nuclear reactor operations has become pivotal for the effectiveness of Space Nuclear Power Systems (SNPS). However, compared to power control, the startup control of SNPS remains underexplored. This study introduces a multi-objective optimization framework aimed at enhancing startup control, leveraging a system level analysis program to simulate the… ▽ More

    Submitted 14 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:2407.13229  [pdf, other

    cs.RO eess.SY

    Disturbance Observer for Estimating Coupled Disturbances

    Authors: Jindou Jia, Yuhang Liu, Kexin Guo, Xiang Yu, Lihua Xie, Lei Guo

    Abstract: High-precision control for nonlinear systems is impeded by the low-fidelity dynamical model and external disturbance. Especially, the intricate coupling between internal uncertainty and external disturbance is usually difficult to be modeled explicitly. Here we show an effective and convergent algorithm enabling accurate estimation of the coupled disturbance via combining control and learning phil… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures

  3. arXiv:2407.11161  [pdf, other

    eess.SP

    S-RAN: Semantic-Aware Radio Access Networks

    Authors: Yao Sun, Lan Zhang, Linke Guo, Jian Li, Dusit Niyato, Yuguang Fang

    Abstract: Semantic communication (SemCom) has been a transformative paradigm, emphasizing the precise exchange of meaningful information over traditional bit-level transmissions. However, existing SemCom research, primarily centered on simplified scenarios like single-pair transmissions with direct wireless links, faces significant challenges when applied to real-world radio access networks (RANs). This art… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.09166  [pdf, ps, other

    eess.SP

    68-Channel Highly-Integrated Neural Signal Processing PSoC with On-Chip Feature Extraction, Compression, and Hardware Accelerators for Neuroprosthetics in 22nm FDSOI

    Authors: Liyuan Guo, Annika Weiße, Seyed Mohammad Ali Zeinolabedin, Franz Marcus Schüffny, Marco Stolba, Qier Ma, Zhuo Wang, Stefan Scholze, Andreas Dixius, Marc Berthel, Johannes Partzsch, Dennis Walter, Georg Ellguth, Sebastian Höppner, Richard George, Christian Mayr

    Abstract: Multi-channel electrophysiology systems for recording of neuronal activity face significant data throughput limitations, hampering real-time, data-informed experiments. These limitations impact both experimental neurobiology research and next-generation neuroprosthetics. We present a novel solution that leverages the high integration density of 22nm FDSOI CMOS technology to address these challenge… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 26 pages, 14 figures, 1 table, Journal

  5. arXiv:2407.08401  [pdf, other

    eess.SY

    Application of Data-Driven Model Predictive Control for Autonomous Vehicle Steering

    Authors: Jiarui Zhang, Aijing Kong, Yu Tang, Zhichao Lv, Lulu Guo, Peng Hang

    Abstract: With the development of autonomous driving technology, there are increasing demands for vehicle control, and MPC has become a widely researched topic in both industry and academia. Existing MPC control methods based on vehicle kinematics or dynamics have challenges such as difficult modeling, numerous parameters, strong nonlinearity, and high computational cost. To address these issues, this paper… ▽ More

    Submitted 18 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  6. arXiv:2406.09154  [pdf, other

    cs.SD cs.CL eess.AS

    Diffusion Gaussian Mixture Audio Denoise

    Authors: Pu Wang, Junhui Li, Jialu Li, Liangdong Guo, Youshan Zhang

    Abstract: Recent diffusion models have achieved promising performances in audio-denoising tasks. The unique property of the reverse process could recover clean signals. However, the distribution of real-world noises does not comply with a single Gaussian distribution and is even unknown. The sampling of Gaussian noise conditions limits its application scenarios. To overcome these challenges, we propose a Di… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  7. arXiv:2406.07854  [pdf, other

    cs.SD cs.MM eess.AS

    Zero-Shot Fake Video Detection by Audio-Visual Consistency

    Authors: Xiaolou Li, Zehua Liu, Chen Chen, Lantian Li, Li Guo, Dong Wang

    Abstract: Recent studies have advocated the detection of fake videos as a one-class detection task, predicated on the hypothesis that the consistency between audio and visual modalities of genuine data is more significant than that of fake data. This methodology, which solely relies on genuine audio-visual data while negating the need for forged counterparts, is thus delineated as a `zero-shot' detection pa… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  8. arXiv:2405.12996  [pdf, other

    eess.IV

    Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data

    Authors: Huidong Xie, Weijie Gan, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Benjamin A. Spencer, Reimund Bayerlein, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang, Ramsey D. Badawi, Chi Liu

    Abstract: As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 16 Pages, 15 Figures, 4 Tables. Paper under review. arXiv admin note: substantial text overlap with arXiv:2311.04248

  9. arXiv:2405.00135  [pdf, other

    cs.IT eess.SP

    Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach

    Authors: Shuai Lyu, Yao Sun, Linke Guo, Xiaoyong Yuan, Fang Fang, Lan Zhang, Xianbin Wang

    Abstract: Task-oriented semantic communications (TSC) enhance radio resource efficiency by transmitting task-relevant semantic information. However, current research often overlooks the inherent semantic distinctions among encoded features. Due to unavoidable channel variations from time and frequency-selective fading, semantically sensitive feature units could be more susceptible to erroneous inference if… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE Communications Letters

  10. arXiv:2403.15421  [pdf, other

    eess.SP cs.LG stat.AP

    Agile gesture recognition for low-power applications: customisation for generalisation

    Authors: Ying Liu, Liucheng Guo, Valeri A. Makarovc, Alexander Gorbana, Evgeny Mirkesa, Ivan Y. Tyukin

    Abstract: Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that op… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  11. arXiv:2403.15145  [pdf, ps, other

    cs.IT eess.SP

    Robust Resource Allocation for STAR-RIS Assisted SWIPT Systems

    Authors: Guangyu Zhu, Xidong Mu, Li Guo, Ao Huang, Shibiao Xu

    Abstract: A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted simultaneous wireless information and power transfer (SWIPT) system is proposed. More particularly, an STAR-RIS is deployed to assist in the information/power transfer from a multi-antenna access point (AP) to multiple single-antenna information users (IUs) and energy users (EUs), where two practica… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  12. arXiv:2403.15130  [pdf, ps, other

    cs.IT eess.SP

    Coexisting Passive RIS and Active Relay Assisted NOMA Systems

    Authors: Ao Huang, Li Guo, Xidong Mu, Chao Dong, Yuanwei Liu

    Abstract: A novel coexisting passive reconfigurable intelligent surface (RIS) and active decode-and-forward (DF) relay assisted non-orthogonal multiple access (NOMA) transmission framework is proposed. In particular, two communication protocols are conceived, namely Hybrid NOMA (H-NOMA) and Full NOMA (F-NOMA). Based on the proposed two protocols, both the sum rate maximization and max-min rate fairness prob… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  13. arXiv:2403.15120  [pdf, ps, other

    cs.IT eess.SP

    STAR-RIS Assisted Downlink Active and Uplink Backscatter Communications with NOMA

    Authors: Ao Huang, Xidong Mu, Li Guo

    Abstract: A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted downlink (DL) active and uplink (UL) backscatter communication (BackCom) framework is proposed. More particularly, a full-duplex (FD) base station (BS) communicates with the DL users via the STAR-RIS's transmission link, while exciting and receiving the information from the UL BackCom devices with t… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  14. arXiv:2403.10064  [pdf, other

    eess.IV cs.CV

    Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI

    Authors: Chong Wang, Lanqing Guo, Yufei Wang, Hao Cheng, Yi Yu, Bihan Wen

    Abstract: Deep unfolding networks (DUN) have emerged as a popular iterative framework for accelerated magnetic resonance imaging (MRI) reconstruction. However, conventional DUN aims to reconstruct all the missing information within the entire null space in each iteration. Thus it could be challenging when dealing with highly ill-posed degradation, usually leading to unsatisfactory reconstruction. In this wo… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  15. arXiv:2403.09355  [pdf, other

    eess.IV cs.CV

    Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction

    Authors: Hanyu Chen, Zhixiu Hao, Lin Guo, Liying Xiao

    Abstract: Sparse-view Computed Tomography (CT) image reconstruction is a promising approach to reduce radiation exposure, but it inevitably leads to image degradation. Although diffusion model-based approaches are computationally expensive and suffer from the training-sampling discrepancy, they provide a potential solution to the problem. This study introduces a novel Cascaded Diffusion with Discrepancy Mit… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  16. arXiv:2403.01956  [pdf, ps, other

    cs.IT eess.SP

    Hybrid Active-Passive RIS Transmitter Enabled Energy-Efficient Multi-User Communications

    Authors: Ao Huang, Xidong Mu, Li Guo, Guangyu Zhu

    Abstract: A novel hybrid active-passive reconfigurable intelligent surface (RIS) transmitter enabled downlink multi-user communication system is investigated. Specifically, RISs are exploited to serve as transmitter antennas, where each element can flexibly switch between active and passive modes to deliver information to multiple users. The system energy efficiency (EE) maximization problem is formulated b… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  17. arXiv:2402.13776  [pdf, other

    eess.IV cs.CV cs.LG

    Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

    Authors: Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, Yan Liang, Qing Yang, Dinggang Shen, Han Zhang

    Abstract: Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, makin… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  18. arXiv:2401.08920  [pdf, other

    eess.IV cs.CV

    Idempotence and Perceptual Image Compression

    Authors: Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

    Abstract: Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we prop… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  19. arXiv:2312.01928  [pdf, other

    eess.SY eess.SP

    Consensus-Based Distributed Nonlinear Filtering with Kernel Mean Embedding

    Authors: Liping Guo, Jimin Wang, Yanlong Zhao, Ji-Feng Zhang

    Abstract: This paper proposes a consensus-based distributed nonlinear filter with kernel mean embedding (KME). This fills with gap of posterior density approximation with KME for distributed nonlinear dynamic systems. To approximate the posterior density, the system state is embedded into a higher-dimensional reproducing kernel Hilbert space (RKHS), and then the nonlinear measurement function is linearly co… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  20. arXiv:2311.18506  [pdf, other

    stat.ML cs.LG eess.SY math.ST

    Global Convergence of Online Identification for Mixed Linear Regression

    Authors: Yujing Liu, Zhixin Liu, Lei Guo

    Abstract: Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships by utilizing a mixture of linear regression sub-models. The identification of MLR is a fundamental problem, where most of the existing results focus on offline algorithms, rely on independent and identically distributed (i.i.d) data assumptions, and provide local convergence results only. This paper invest… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  21. arXiv:2311.10118  [pdf, other

    eess.IV cs.CV q-bio.QM

    Now and Future of Artificial Intelligence-based Signet Ring Cell Diagnosis: A Survey

    Authors: Zhu Meng, Junhao Dong, Limei Guo, Fei Su, Guangxi Wang, Zhicheng Zhao

    Abstract: Since signet ring cells (SRCs) are associated with high peripheral metastasis rate and dismal survival, they play an important role in determining surgical approaches and prognosis, while they are easily missed by even experienced pathologists. Although automatic diagnosis SRCs based on deep learning has received increasing attention to assist pathologists in improving the diagnostic efficiency an… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  22. arXiv:2311.04248  [pdf, other

    eess.IV

    DDPET-3D: Dose-aware Diffusion Model for 3D Ultra Low-dose PET Imaging

    Authors: Huidong Xie, Weijie Gan, Bo Zhou, Xiongchao Chen, Qiong Liu, Xueqi Guo, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Ge Wang, Chi Liu

    Abstract: As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image… ▽ More

    Submitted 28 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: Paper under review. 16 pages, 11 figures, 4 tables

  23. arXiv:2310.20427  [pdf, other

    eess.IV cs.CV cs.LG

    Assessing and Enhancing Robustness of Deep Learning Models with Corruption Emulation in Digital Pathology

    Authors: Peixiang Huang, Songtao Zhang, Yulu Gan, Rui Xu, Rongqi Zhu, Wenkang Qin, Limei Guo, Shan Jiang, Lin Luo

    Abstract: Deep learning in digital pathology brings intelligence and automation as substantial enhancements to pathological analysis, the gold standard of clinical diagnosis. However, multiple steps from tissue preparation to slide imaging introduce various image corruptions, making it difficult for deep neural network (DNN) models to achieve stable diagnostic results for clinical use. In order to assess an… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  24. arXiv:2310.15548  [pdf, ps, other

    eess.SP

    Knowledge-driven Meta-learning for CSI Feedback

    Authors: Han Xiao, Wenqiang Tian, Wendong Liu, Jiajia Guo, Zhi Zhang, Shi Jin, Zhihua Shi, Li Guo, Jia Shen

    Abstract: Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output systems. Recently, deep learning (DL) has been introduced for CSI feedback enhancement through massive collected training data and lengthy training time, which is quite costly and impractical for realistic deployment. In this article, a knowledge-driven meta-learning a… ▽ More

    Submitted 25 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.13475

  25. arXiv:2310.11230  [pdf, other

    eess.AS cs.LG cs.SD

    Zipformer: A faster and better encoder for automatic speech recognition

    Authors: Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey

    Abstract: The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame… ▽ More

    Submitted 9 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Published as a conference paper at ICLR 2024

  26. arXiv:2310.04992  [pdf, other

    eess.IV cs.CV

    VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  27. arXiv:2309.09454  [pdf, ps, other

    cs.LG eess.SY

    Asymptotically Efficient Online Learning for Censored Regression Models Under Non-I.I.D Data

    Authors: Lantian Zhang, Lei Guo

    Abstract: The asymptotically efficient online learning problem is investigated for stochastic censored regression models, which arise from various fields of learning and statistics but up to now still lacks comprehensive theoretical studies on the efficiency of the learning algorithms. For this, we propose a two-step online algorithm, where the first step focuses on achieving algorithm convergence, and the… ▽ More

    Submitted 1 October, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: 35 pages

  28. arXiv:2309.08105  [pdf, other

    eess.AS cs.SD

    Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

    Authors: Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey

    Abstract: In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest freely-available corpus of speech with supervisions. Different from other open-sourced datasets that only provide normalized transcriptions, Libriheavy contains richer information such as punctuation, casin… ▽ More

    Submitted 14 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  29. arXiv:2309.07414  [pdf, other

    eess.AS cs.CL cs.SD

    PromptASR for contextualized ASR with controllable style

    Authors: Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey

    Abstract: Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR) systems to achieve contextualized ASR with controllable style of transcriptions. Specifically, a dedicated text encoder encodes the text prompts and t… ▽ More

    Submitted 24 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Proc. ICASSP 2024

  30. arXiv:2308.13287  [pdf, other

    eess.IV

    Efficient Learned Lossless JPEG Recompression

    Authors: Lina Guo, Yuanyuan Wang, Tongda Xu, Jixiang Luo, Dailan He, Zhenjun Ji, Shanshan Wang, Yang Wang, Hongwei Qin

    Abstract: JPEG is one of the most popular image compression methods. It is beneficial to compress those existing JPEG files without introducing additional distortion. In this paper, we propose a deep learning based method to further compress JPEG images losslessly. Specifically, we propose a Multi-Level Parallel Conditional Modeling (ML-PCM) architecture, which enables parallel decoding in different granula… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  31. arXiv:2308.10217  [pdf, other

    eess.SY

    Fault Separation Based on An Excitation Operator with Application to a Quadrotor UAV

    Authors: Sicheng Zhou, Meng Wang, Jindou Jia, Kexin Guo, Xiang Yu, Youmin Zhang, Lei Guo

    Abstract: This paper presents an excitation operator based fault separation architecture for a quadrotor unmanned aerial vehicle (UAV) subject to loss of effectiveness (LoE) faults, actuator aging, and load uncertainty. The actuator fault dynamics is deeply excavated, containing the deep coupling information among the actuator faults, the system states, and control inputs. By explicitly considering the phys… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  32. arXiv:2308.08229  [pdf, other

    eess.SY

    Composite Disturbance Filtering: A Novel State Estimation Scheme for Systems With Multi-Source, Heterogeneous, and Isomeric Disturbances

    Authors: Lei Guo, Wenshuo Li, Yukai Zhu, Xiang Yu, Zidong Wang

    Abstract: State estimation has long been a fundamental problem in signal processing and control areas. The main challenge is to design filters with ability to reject or attenuate various disturbances. With the arrival of big data era, the disturbances of complicated systems are physically multi-source, mathematically heterogenous, affecting the system dynamics via isomeric (additive, multiplicative and rece… ▽ More

    Submitted 12 September, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

  33. arXiv:2307.07710  [pdf, other

    cs.CV eess.IV

    ExposureDiffusion: Learning to Expose for Low-light Image Enhancement

    Authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex C. Kot, Bihan Wen

    Abstract: Previous raw image-based low-light image enhancement methods predominantly relied on feed-forward neural networks to learn deterministic mappings from low-light to normally-exposed images. However, they failed to capture critical distribution information, leading to visually undesirable results. This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure… ▽ More

    Submitted 15 August, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: accepted by ICCV2023

  34. arXiv:2306.12058  [pdf, other

    cs.CV eess.IV

    Beyond Learned Metadata-based Raw Image Reconstruction

    Authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex C. Kot, Bihan Wen

    Abstract: While raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective im… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  35. arXiv:2306.00446  [pdf, other

    eess.IV cs.CV

    Evaluation of Multi-indicator And Multi-organ Medical Image Segmentation Models

    Authors: Qi Ye, Lihua Guo

    Abstract: In recent years, "U-shaped" neural networks featuring encoder and decoder structures have gained popularity in the field of medical image segmentation. Various variants of this model have been developed. Nevertheless, the evaluation of these models has received less attention compared to model development. In response, we propose a comprehensive method for evaluating medical image segmentation mod… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  36. arXiv:2305.16592  [pdf, other

    cs.SD eess.AS

    A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation

    Authors: Xipin Wei, Junhui Chen, Zirui Zheng, Li Guo, Lantian Li, Dong Wang

    Abstract: Recently, multi-instrument music generation has become a hot topic. Different from single-instrument generation, multi-instrument generation needs to consider inter-track harmony besides intra-track coherence. This is usually achieved by composing note segments from different instruments into a signal sequence. This composition could be on different scales, such as note, bar, or track. Most existi… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: to be published in INTERSPEECH 2023

  37. arXiv:2305.13553  [pdf, other

    eess.SP

    Digital-SC: Digital Semantic Communication with Adaptive Network Split and Learned Non-Linear Quantization

    Authors: Lei Guo, Wei Chen, Yuxuan Sun, Bo Ai

    Abstract: Semantic communication, an intelligent communication paradigm that aims to transmit useful information in the semantic domain, is facilitated by deep learning techniques. Robust semantic features can be learned and transmitted in an analog fashion, but it poses new challenges to hardware, protocol, and encryption. In this paper, we propose a digital semantic communication system, which consists of… ▽ More

    Submitted 4 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

  38. arXiv:2305.11558  [pdf, other

    eess.AS cs.CL

    Blank-regularized CTC for Frame Skipping in Neural Transducer

    Authors: Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey

    Abstract: Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems. Due to their frame-synchronous design, blank symbols are introduced to address the length mismatch between acoustic frames and output tokens, which might bring redundant computation. Previous studies managed to accelerate the training and inference of neural Transducers by… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted in INTERSPEECH 2023

  39. arXiv:2305.11539  [pdf, other

    eess.AS

    Delay-penalized CTC implemented based on Finite State Transducer

    Authors: Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey

    Abstract: Connectionist Temporal Classification (CTC) suffers from the latency problem when applied to streaming models. We argue that in CTC lattice, the alignments that can access more future context are preferred during training, thereby leading to higher symbol delay. In this work we propose the delay-penalized CTC which is augmented with latency penalty regularization. We devise a flexible and efficien… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted in INTERSPEECH 2023

  40. arXiv:2305.10631  [pdf

    eess.IV cs.CV cs.LG

    An image segmentation algorithm based on multi-scale feature pyramid network

    Authors: Yu Xiao, Xin Yang, Sijuan Huang, Lihua Guo

    Abstract: Medical image segmentation is particularly critical as a prerequisite for relevant quantitative analysis in the treatment of clinical diseases. For example, in clinical cervical cancer radiotherapy, after acquiring subabdominal MRI images, a fast and accurate image segmentation of organs and tumors in MRI images can optimize the clinical radiotherapy process, whereas traditional approaches use man… ▽ More

    Submitted 28 June, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  41. arXiv:2304.08345  [pdf, other

    cs.LG cs.CL cs.CV cs.MM eess.AS

    VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

    Authors: Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang, Jing Liu

    Abstract: In this paper, we propose a Vision-Audio-Language Omni-peRception pretraining model (VALOR) for multi-modal understanding and generation. Different from widely-studied vision-language pretraining models, VALOR jointly models relationships of vision, audio and language in an end-to-end manner. It contains three separate encoders for single modality representations, and a decoder for multimodal cond… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: Preprint version w/o audio files embeded in PDF. Audio embeded version can be found on project page or github

  42. arXiv:2303.10912  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Exploring Representation Learning for Small-Footprint Keyword Spotting

    Authors: Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang

    Abstract: In this paper, we investigate representation learning for low-resource keyword spotting (KWS). The main challenges of KWS are limited labeled data and limited available device resources. To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model. First, local-global contrastive siamese networks (LGCSiam) a… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  43. arXiv:2303.10897  [pdf, other

    cs.SD cs.CL eess.AS q-bio.NC

    Relate auditory speech to EEG by shallow-deep attention-based network

    Authors: Fan Cui, Liyong Guo, Lang He, Jiyao Liu, ErCheng Pei, Yujun Wang, Dongmei Jiang

    Abstract: Electroencephalography (EEG) plays a vital role in detecting how brain responses to different stimulus. In this paper, we propose a novel Shallow-Deep Attention-based Network (SDANet) to classify the correct auditory stimulus evoking the EEG signal. It adopts the Attention-based Correlation Module (ACM) to discover the connection between auditory speech and EEG from global aspect, and the Shallow-… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  44. arXiv:2303.08331  [pdf, other

    cs.CV cs.LG cs.NE eess.IV

    Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

    Authors: Gen Li, Jie Ji, Minghai Qin, Wei Niu, Bin Ren, Fatemeh Afghah, Linke Guo, Xiaolong Ma

    Abstract: As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, t… ▽ More

    Submitted 18 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 Highlight Paper

  45. arXiv:2303.02057  [pdf, other

    eess.IV cs.CV

    Unsupervised Deep Digital Staining For Microscopic Cell Images Via Knowledge Distillation

    Authors: Ziwang Xu, Lanqing Guo, Shuyan Zhang, Alex C. Kot, Bihan Wen

    Abstract: Staining is critical to cell imaging and medical diagnosis, which is expensive, time-consuming, labor-intensive, and causes irreversible changes to cell tissues. Recent advances in deep learning enabled digital staining via supervised model training. However, it is difficult to obtain large-scale stained/unstained cell image pairs in practice, which need to be perfectly aligned with the supervisio… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  46. arXiv:2302.12995  [pdf, other

    cs.CV eess.IV

    Raw Image Reconstruction with Learned Compact Metadata

    Authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex Kot, Bihan Wen

    Abstract: While raw images exhibit advantages over sRGB images (e.g., linearity and fine-grained quantization level), they are not widely used by common users due to the large storage requirements. Very recent works propose to compress raw images by designing the sampling masks in the raw image pixel space, leading to suboptimal image representations and redundant metadata. In this paper, we propose a novel… ▽ More

    Submitted 27 February, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

    Comments: Accepted by CVPR 2023

  47. A Lifetime Extended Energy Management Strategy for Fuel Cell Hybrid Electric Vehicles via Self-Learning Fuzzy Reinforcement Learning

    Authors: Liang Guo, Zhongliang Li, Rachid Outbib

    Abstract: Modeling difficulty, time-varying model, and uncertain external inputs are the main challenges for energy management of fuel cell hybrid electric vehicles. In the paper, a fuzzy reinforcement learning-based energy management strategy for fuel cell hybrid electric vehicles is proposed to reduce fuel consumption, maintain the batteries' long-term operation, and extend the lifetime of the fuel cells… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Journal ref: 2022 10th International Conference on Systems and Control (ICSC), IEEE, Nov 2022, Marseille, France. pp.161-167

  48. arXiv:2301.13475  [pdf, ps, other

    eess.SP

    A Knowledge-Driven Meta-Learning Method for CSI Feedback

    Authors: Han Xiao, Wenqiang Tian, Wendong Liu, Zhi Zhang, Zhihua Shi, Li Guo, Jia Shen

    Abstract: Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output (MIMO) systems. Recently, deep learning (DL) has been introduced to enhance CSI feedback in massive MIMO application, where the massive collected training data and lengthy training time are costly and impractical for realistic deployment. In this paper, a knowledge-dri… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  49. Energy-Efficient Driving in Connected Corridors via Minimum Principle Control: Vehicle-in-the-Loop Experimental Verification in Mixed Fleets

    Authors: Tyler Ard, Longxiang Guo, Jihun Han, Yunyi Jia, Ardalan Vahidi, Dominik Karbowski

    Abstract: Connected and automated vehicles (CAVs) can plan and actuate control that explicitly considers performance, system safety, and actuation constraints in a manner more efficient than their human-driven counterparts. In particular, eco-driving is enabled through connected exchange of information from signalized corridors that share their upcoming signal phase and timing (SPaT). This is accomplished i… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: 13 Figures

  50. arXiv:2211.00508  [pdf, other

    eess.AS cs.CL cs.SD

    Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation

    Authors: Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang, Long Lin, Mingshuang Luo, Piotr Zelasko, Daniel Povey

    Abstract: Knowledge distillation(KD) is a common approach to improve model performance in automatic speech recognition (ASR), where a student model is trained to imitate the output behaviour of a teacher model. However, traditional KD methods suffer from teacher label storage issue, especially when the training corpora are large. Although on-the-fly teacher label generation tackles this issue, the training… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2022