Zum Hauptinhalt springen

Showing 1–45 of 45 results for author: Zha, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.13376  [pdf, other

    cs.AI cs.LG eess.SY math.CT

    Reduce, Reuse, Recycle: Categories for Compositional Reinforcement Learning

    Authors: Georgios Bakirtzis, Michail Savvas, Ruihan Zhao, Sandeep Chinchali, Ufuk Topcu

    Abstract: In reinforcement learning, conducting task composition by forming cohesive, executable sequences from multiple tasks remains challenging. However, the ability to (de)compose tasks is a linchpin in developing robotic systems capable of learning complex behaviors. Yet, compositional reinforcement learning is beset with difficulties, including the high dimensionality of the problem space, scarcity of… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: ECAI 2024

  2. arXiv:2408.07926  [pdf, other

    eess.SY

    Enhanced Equivalent Circuit Model for High Current Discharge of Lithium-Ion Batteries with Application to Electric Vertical Takeoff and Landing Aircraft

    Authors: Alireza Goshtasbi, Ruxiu Zhao, Ruiting Wang, Sangwoo Han, Wenting Ma, Jeremy Neubauer

    Abstract: Conventional battery equivalent circuit models (ECMs) have limited capability to predict performance at high discharge rates, where lithium depleted regions may develop and cause a sudden exponential drop in the cell's terminal voltage. Having accurate predictions of performance under such conditions is necessary for electric vertical takeoff and landing (eVTOL) aircraft applications, where high d… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  3. arXiv:2407.15496  [pdf, other

    eess.SP

    Securing V2I Backscattering from Eavesdropper

    Authors: Ruotong Zhao, Deepak Mishra, Aruna Seneviratne

    Abstract: As our cities become more intelligent and more connected with new technologies like 6G, improving communication between vehicles and infrastructure is essential while reducing energy consumption. This study proposes a secure framework for vehicle-to-infrastructure (V2I) backscattering near an eavesdropping vehicle to maximize the sum secrecy rate of V2I backscatter communication over multiple cohe… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: This paper contains 6 pages and 6 figures, which are accepted and will be presented in the proceedings of the 2024 IEEE International Conference on Communications (ICC), but not online yet

  4. arXiv:2405.20693  [pdf, other

    eess.IV cs.CV

    R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

    Authors: Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li

    Abstract: 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a p… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  5. arXiv:2405.09582  [pdf

    cs.CV eess.IV

    AD-Aligning: Emulating Human-like Generalization for Cognitive Domain Adaptation in Deep Learning

    Authors: Zhuoying Li, Bohua Wan, Cong Mu, Ruzhang Zhao, Shushan Qiu, Chao Yan

    Abstract: Domain adaptation is pivotal for enabling deep learning models to generalize across diverse domains, a task complicated by variations in presentation and cognitive nuances. In this paper, we introduce AD-Aligning, a novel approach that combines adversarial training with source-target domain alignment to enhance generalization capabilities. By pretraining with Coral loss and standard loss, AD-Align… ▽ More

    Submitted 21 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by 2024 5th International Conference on Electronic Communication and Artificial Intelligence

  6. arXiv:2405.06165  [pdf, other

    eess.SY

    Resilient control of networked switched systems subject to deception attack and DoS attack

    Authors: Rui Zhao, Zhiqiang Zuo, Ying Tan, Yijing Wang, Wentao Zhang

    Abstract: In this paper, the resilient control for switched systems in the presence of deception attack and denial-of-service (DoS) attack is addressed. Due to the interaction of two kinds of attacks and the asynchronous phenomenon of controller mode and subsystem mode, the system dynamics becomes much more complex. A criterion is derived to ensure the mean square security level of the closed-loop system. T… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  7. Advanced Long-Content Speech Recognition With Factorized Neural Transducer

    Authors: Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

    Abstract: In this paper, we propose two novel approaches, which integrate long-content information into the factorized neural transducer (FNT) based architecture in both non-streaming (referred to as LongFNT ) and streaming (referred to as SLongFNT ) scenarios. We first investigate whether long-content transcriptions can improve the vanilla conformer transducer (C-T) models. Our experiments indicate that th… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by TASLP 2024

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1803-1815, 2024

  8. arXiv:2402.17718  [pdf

    cs.LG eess.SP

    Towards a Digital Twin Framework in Additive Manufacturing: Machine Learning and Bayesian Optimization for Time Series Process Optimization

    Authors: Vispi Karkaria, Anthony Goeckner, Rujing Zha, Jie Chen, Jianjing Zhang, Qi Zhu, Jian Cao, Robert X. Gao, Wei Chen

    Abstract: Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 Pages, 10 Figures, 1 Table, NAMRC Conference

  9. arXiv:2402.05725  [pdf, other

    cs.RO eess.SP

    Dual-modal Tactile E-skin: Enabling Bidirectional Human-Robot Interaction via Integrated Tactile Perception and Feedback

    Authors: Shilong Mu, Runze Zhao, Zenan Lin, Yan Huang, Shoujie Li, Chenchang Li, Xiao-Ping Zhang, Wenbo Ding

    Abstract: To foster an immersive and natural human-robot interaction, the implementation of tactile perception and feedback becomes imperative, effectively bridging the conventional sensory gap. In this paper, we propose a dual-modal electronic skin (e-skin) that integrates magnetic tactile sensing and vibration feedback for enhanced human-robot interaction. The dual-modal tactile e-skin offers multi-functi… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 7 pages, 8 figures. Submitted to 2024 IEEE International Conference on Robotics and Automation (ICRA), Japan, Yokohama

  10. arXiv:2312.15863  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

    Authors: Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, Jiangjin Yin

    Abstract: Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at t… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024, full paper with oral presentation). Cover our preliminary study: arXiv:2212.14538

  11. arXiv:2309.08323  [pdf

    cs.RO eess.SY

    MLP Based Continuous Gait Recognition of a Powered Ankle Prosthesis with Serial Elastic Actuator

    Authors: Yanze Li, Feixing Chen, Jingqi Cao, Ruoqi Zhao, Xuan Yang, Xingbang Yang, Yubo Fan

    Abstract: Powered ankle prostheses effectively assist people with lower limb amputation to perform daily activities. High performance prostheses with adjustable compliance and capability to predict and implement amputee's intent are crucial for them to be comparable to or better than a real limb. However, current designs fail to provide simple yet effective compliance of the joint with full potential of mod… ▽ More

    Submitted 30 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Submitted to IROS 2024

  12. arXiv:2309.08131  [pdf, other

    eess.AS cs.SD

    t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

    Authors: Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li

    Abstract: Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR). T-SOT effectively handles overlapped speech by representing multi-talker transcriptions as a single token stream with $\langle \text{cc}\rangle$ symbols interspersed. However, the use of a naive neural transducer architecture significantly cons… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures, submitted to ICASSP2024

  13. arXiv:2309.07369  [pdf, other

    eess.AS cs.CL cs.SD

    Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

    Authors: Shaoshi Ling, Guoli Ye, Rui Zhao, Yifan Gong

    Abstract: Attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years. However, the joint optimization of acoustic model and language model in end-to-end manner has created challenges for text adaptation. In particular, effectively, quickly and inexpensively adapting text has become a primary concern for deploying AED systems in industry. To address this issue,… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  14. arXiv:2308.08125  [pdf, other

    cs.SD cs.CL cs.HC eess.AS

    Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

    Authors: Running Zhao, Jiangtao Yu, Hang Zhao, Edith C. H. Ngai

    Abstract: Millimeter wave (mmWave) based speech recognition provides more possibility for audio-related applications, such as conference speech transcription and eavesdropping. However, considering the practicality in real scenarios, latency and recognizable vocabulary size are two critical factors that cannot be overlooked. In this paper, we propose Radio2Text, the first mmWave-based system for streaming a… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (ACM IMWUT/UbiComp 2023)

  15. arXiv:2306.08630  [pdf, other

    eess.IV cs.CV

    High-Dimensional MR Reconstruction Integrating Subspace and Adaptive Generative Models

    Authors: Ruiyang Zhao, Xi Peng, Varun A. Kelkar, Mark A. Anastasio, Fan Lam

    Abstract: We present a novel method that integrates subspace modeling with an adaptive generative image prior for high-dimensional MR image reconstruction. The subspace model imposes an explicit low-dimensional representation of the high-dimensional images, while the generative image prior serves as a spatial constraint on the "contrast-weighted" images or the spatial coefficients of the subspace model. A f… ▽ More

    Submitted 16 June, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  16. arXiv:2306.07019  [pdf, other

    cs.LG eess.SP

    Dynamic Causal Graph Convolutional Network for Traffic Prediction

    Authors: Junpeng Lin, Ziyue Li, Zhishuai Li, Lei Bai, Rui Zhao, Chen Zhang

    Abstract: Modeling complex spatiotemporal dependencies in correlated traffic series is essential for traffic prediction. While recent works have shown improved prediction performance by using neural networks to extract spatiotemporal correlations, their effectiveness depends on the quality of the graph structures used to represent the spatial topology of the traffic network. In this work, we propose a novel… ▽ More

    Submitted 7 September, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted to IEEE CASE 2023; Peter Luh Best Memorial Award for Young Researcher (Finalist)

  17. arXiv:2212.01992  [pdf, other

    cs.CL cs.SD eess.AS

    Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

    Authors: Rui Zhao, Jian Xue, Partha Parthasarathy, Veljko Miljanic, Jinyu Li

    Abstract: Neural transducer is now the most popular end-to-end model for speech recognition, due to its naturally streaming ability. However, it is challenging to adapt it with text-only data. Factorized neural transducer (FNT) model was proposed to mitigate this problem. The improved adaptation ability of FNT on text-only adaptation data came at the cost of lowered accuracy compared to the standard neural… ▽ More

    Submitted 23 February, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

  18. arXiv:2211.09412  [pdf, other

    cs.SD cs.CL eess.AS

    LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

    Authors: Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

    Abstract: Traditional automatic speech recognition~(ASR) systems usually focus on individual utterances, without considering long-form speech with useful historical information, which is more practical in real scenarios. Simply attending longer transcription history for a vanilla neural transducer model shows no much gain in our preliminary experiments, since the prediction network is not a pure language mo… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP2023

  19. arXiv:2211.00272  [pdf, other

    eess.SP

    RF-CHORD: Towards Deployable RFID Localization System for Logistics Network

    Authors: Bo Liang, Purui Wang, Renjie Zhao, Heyu Guo, Pengyu Zhang, Junchen Guo, Shunmin Zhu, Hongqiang Harry Liu, Xinyu Zhang, Chenren Xu

    Abstract: RFID localization is considered the key enabler of automating the process of inventory tracking and management for high-performance logistic network. A practical and deployable RFID localization system needs to meet reliability, throughput, and range requirements. This paper presents RF-Chord, the first RFID localization system that simultaneously meets all three requirements. RF-Chord features a… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: To be published in NSDI 2023

  20. NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction

    Authors: Ruyi Zha, Yanhao Zhang, Hongdong Li

    Abstract: This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction (Cone Beam Computed Tomography) that requires no external training data. Specifically, the desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network. We synthesize projections discretely and train the net… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: MICCAI2022 (Oral)

  21. arXiv:2209.09108  [pdf, other

    eess.SY

    Online Poisoning Attacks Against Data-Driven Predictive Control

    Authors: Yue Yu, Ruihan Zhao, Sandeep Chinchali, Ufuk Topcu

    Abstract: Data-driven predictive control (DPC) is a feedback control method for systems with unknown dynamics. It repeatedly optimizes a system's future trajectories based on past input-output data. We develop a numerical method that computes poisoning attacks that inject additive perturbations to the online output data to change the trajectories optimized by DPC. This method is based on implicitly differen… ▽ More

    Submitted 23 November, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

  22. arXiv:2207.06615  [pdf, other

    eess.SY

    Approximate synchronization of coupled multi-valued logical networks

    Authors: Rong Zhao, Jun-e Feng, Biao Wang

    Abstract: This article deals with the approximate synchronization of two coupled multi-valued logical networks. According to the initial state set from which both systems start, two kinds of approximate synchronization problem, local approximate synchronization and global approximate synchronization, are proposed for the first time. Three new notions: approximate synchronization state set (ASSS), the maximu… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  23. arXiv:2206.11066  [pdf, other

    cs.SD eess.AS

    Radio2Speech: High Quality Speech Recovery from Radio Frequency Signals

    Authors: Running Zhao, Jiangtao Yu, Tingle Li, Hang Zhao, Edith C. H. Ngai

    Abstract: Considering the microphone is easily affected by noise and soundproof materials, the radio frequency (RF) signal is a promising candidate to recover audio as it is immune to noise and can traverse many soundproof objects. In this paper, we introduce Radio2Speech, a system that uses RF signals to recover high quality speech from the loudspeaker. Radio2Speech can recover speech comparable to the qua… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: Accepted to INTERSPEECH 2022

  24. arXiv:2201.10737  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Class-Aware Adversarial Transformers for Medical Image Segmentation

    Authors: Chenyu You, Ruihan Zhao, Fenglin Liu, Siyuan Dong, Sandeep Chinchali, Ufuk Topcu, Lawrence Staib, James S. Duncan

    Abstract: Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale f… ▽ More

    Submitted 15 December, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

  25. arXiv:2110.15327  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    MEGAN: Memory Enhanced Graph Attention Network for Space-Time Video Super-Resolution

    Authors: Chenyu You, Lianyi Han, Aosong Feng, Ruihan Zhao, Hui Tang, Wei Fan

    Abstract: Space-time video super-resolution (STVSR) aims to construct a high space-time resolution video sequence from the corresponding low-frame-rate, low-resolution video sequence. Inspired by the recent success to consider spatial-temporal information for space-time super-resolution, our main goal in this work is to take full considerations of spatial and temporal correlations within the video sequences… ▽ More

    Submitted 29 November, 2021; v1 submitted 28 October, 2021; originally announced October 2021.

  26. arXiv:2109.11524  [pdf, other

    cs.CV cs.LG eess.IV physics.med-ph

    End-to-End AI-based MRI Reconstruction and Lesion Detection Pipeline for Evaluation of Deep Learning Image Reconstruction

    Authors: Ruiyang Zhao, Yuxin Zhang, Burhaneddin Yaman, Matthew P. Lungren, Michael S. Hansen

    Abstract: Deep learning techniques have emerged as a promising approach to highly accelerated MRI. However, recent reconstruction challenges have shown several drawbacks in current deep learning approaches, including the loss of fine image details even using models that perform well in terms of global quality metrics. In this study, we propose an end-to-end deep learning framework for image reconstruction a… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

  27. arXiv:2109.03812  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    fastMRI+: Clinical Pathology Annotations for Knee and Brain Fully Sampled Multi-Coil MRI Data

    Authors: Ruiyang Zhao, Burhaneddin Yaman, Yuxin Zhang, Russell Stewart, Austin Dixon, Florian Knoll, Zhengnan Huang, Yvonne W. Lui, Michael S. Hansen, Matthew P. Lungren

    Abstract: Improving speed and image quality of Magnetic Resonance Imaging (MRI) via novel reconstruction approaches remains one of the highest impact applications for deep learning in medical imaging. The fastMRI dataset, unique in that it contains large volumes of raw MRI data, has enabled significant advances in accelerating MRI using deep learning-based reconstruction methods. While the impact of the fas… ▽ More

    Submitted 13 September, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

  28. arXiv:2105.07059  [pdf, other

    cs.CV cs.LG eess.IV

    Momentum Contrastive Voxel-wise Representation Learning for Semi-supervised Volumetric Medical Image Segmentation

    Authors: Chenyu You, Ruihan Zhao, Lawrence Staib, James S. Duncan

    Abstract: Contrastive learning (CL) aims to learn useful representation without relying on expert annotations in the context of medical image segmentation. Existing approaches mainly contrast a single positive vector (i.e., an augmentation of the same image) against a set of negatives within the entire remainder of the batch by simply mapping all input features into the same constant vector. Despite the imp… ▽ More

    Submitted 7 March, 2022; v1 submitted 14 May, 2021; originally announced May 2021.

  29. arXiv:2105.00858  [pdf, other

    eess.AS cs.CL cs.SD

    On Addressing Practical Challenges for RNN-Transducer

    Authors: Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong

    Abstract: In this paper, several works are proposed to address practical challenges for deploying RNN Transducer (RNN-T) based speech recognition system. These challenges are adapting a well-trained RNN-T model to a new domain without collecting the audio data, obtaining time stamps and confidence scores at word level. The first challenge is solved with a splicing data method which concatenates the speech s… ▽ More

    Submitted 18 July, 2021; v1 submitted 27 April, 2021; originally announced May 2021.

    Comments: 5 pages

  30. arXiv:2012.11736  [pdf, ps, other

    eess.SP

    Energy Efficiency Maximization in RIS-Aided Cell-Free Network with Limited Backhaul

    Authors: Quang Nhat Le, Van-Dinh Nguyen, Octavia A. Dobre, Ruiqin Zhao

    Abstract: Integrating the reconfigurable intelligent surface in a cell-free (RIS-CF) network is an effective solution to improve the capacity and coverage of future wireless systems with low cost and power consumption. The reflecting coefficients of RISs can be programmed to enhance signals received at users. This letter addresses a joint design of transmit beamformers at access points and reflecting coeffi… ▽ More

    Submitted 8 March, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: submitted for possible publication

  31. Full-Duplex Non-Orthogonal Multiple Access Cooperative Overlay Spectrum-Sharing Networks with SWIPT

    Authors: Quang Nhat Le, Animesh Yadav, Nam-Phong Nguyen, Octavia A. Dobre, Ruiqin Zhao

    Abstract: This paper proposes a novel non-orthogonal multiple access (NOMA) assisted cooperative spectrum sharing network, in which one of the full-duplex (FD) secondary transmitters (STs) is chosen among many for forwarding the primary transmitter's and its own information to primary receiver and secondary receivers, respectively, using NOMA technique. To stimulate the ST to conduct cooperative transmissio… ▽ More

    Submitted 19 November, 2020; originally announced November 2020.

    Comments: accepted for publication in the IEEE Transactions on Green Communications and Networking

  32. arXiv:2011.07549  [pdf, ps, other

    eess.SP

    Learning-Assisted User Clustering in Cell-Free Massive MIMO-NOMA Networks

    Authors: Quang Nhat Le, Van-Dinh Nguyen, Nam-Phong Nguyen, Symeon Chatzinotas, Octavia A. Dobre, Ruiqin Zhao

    Abstract: The superior spectral efficiency (SE) and user fairness feature of non-orthogonal multiple access (NOMA) systems are achieved by exploiting user clustering (UC) more efficiently. However, a random UC certainly results in a suboptimal solution while an exhaustive search method comes at the cost of high complexity, especially for systems of medium-to-large size. To address this problem, we develop t… ▽ More

    Submitted 15 November, 2020; originally announced November 2020.

    Comments: submitted for possible publication

  33. arXiv:2011.01991  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

    Authors: Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong

    Abstract: The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models. In this work, we propose an internal LM estimation (ILME) method to facilitate a more effective integration of the external LM with all pre-existing E2E models with no additional model training, including… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: 8 pages, 2 figures, SLT 2021

    Journal ref: 2021 IEEE Spoken Language Technology Workshop (SLT)

  34. arXiv:2009.04286  [pdf, other

    eess.IV cs.CV

    Enhancing and Learning Denoiser without Clean Reference

    Authors: Rui Zhao, Daniel P. K. Lun, Kin-Man Lam

    Abstract: Recent studies on learning-based image denoising have achieved promising performance on various noise reduction tasks. Most of these deep denoisers are trained either under the supervision of clean references, or unsupervised on synthetic noise. The assumption with the synthetic noise leads to poor generalization when facing real photographs. To address this issue, we propose a novel deep image-de… ▽ More

    Submitted 28 March, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

  35. arXiv:2008.05086  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

    Authors: Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li

    Abstract: Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system, to transfer the knowledge from source to target language. TL can be applied to end-to-end (E2E) ASR system such as recurrent neural network transducer (RNN-T) models, by initializing the encoder and/or prediction network of the target language with the pre-trained models from source language. In… ▽ More

    Submitted 17 August, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

  36. arXiv:2008.00250  [pdf, ps, other

    eess.SP cs.LG eess.SY

    Deep Reinforcement Learning Based Mobile Edge Computing for Intelligent Internet of Things

    Authors: Rui Zhao, Xinjie Wang, Junjuan Xia, Liseng Fan

    Abstract: In this paper, we investigate mobile edge computing (MEC) networks for intelligent internet of things (IoT), where multiple users have some computational tasks assisted by multiple computational access points (CAPs). By offloading some tasks to the CAPs, the system performance can be improved through reducing the latency and energy consumption, which are the two important metrics of interest in th… ▽ More

    Submitted 1 August, 2020; originally announced August 2020.

  37. arXiv:2007.15188  [pdf, other

    eess.AS cs.CL cs.SD

    Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

    Authors: Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong

    Abstract: Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition. In this paper, we describe our recent development of RNN-T models with reduced GPU memory consumption during training, better initialization strategy, and advanced encoder modeling with future lookahead.… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Comments: Accepted by Interspeech 2020

  38. Enhancement of a CNN-Based Denoiser Based on Spatial and Spectral Analysis

    Authors: Rui Zhao, Kin-Man Lam, Daniel P. K. Lun

    Abstract: Convolutional neural network (CNN)-based image denoising methods have been widely studied recently, because of their high-speed processing capability and good visual quality. However, most of the existing CNN-based denoisers learn the image prior from the spatial domain, and suffer from the problem of spatially variant noise, which limits their performance in real-world image denoising tasks. In t… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

    Comments: ICIP 2019

  39. arXiv:2005.14327  [pdf, ps, other

    eess.AS cs.CL

    On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

    Authors: Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu

    Abstract: Recently, there has been a strong push to transition from hybrid models to end-to-end (E2E) models for automatic speech recognition. Currently, there are three promising E2E methods: recurrent neural network transducer (RNN-T), RNN attention-based encoder-decoder (AED), and Transformer-AED. In this study, we conduct an empirical comparison of RNN-T, RNN-AED, and Transformer-AED models, in both non… ▽ More

    Submitted 29 July, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: Accepted by Interspeech 2020

  40. arXiv:2005.00572  [pdf, other

    cs.CL eess.AS

    Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

    Authors: Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong

    Abstract: Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition. However, RNN-T training is made difficult by the huge memory requirements, and complicated neural structure. A common solution to ease the RNN-T training is to employ c… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: Accepted by ICASSP 2020

  41. arXiv:2003.07482  [pdf, other

    eess.AS cs.CL cs.SD

    High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

    Authors: Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong

    Abstract: While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved. In this paper, we detail our recent efforts to improve conventional hybrid LST… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: Accepted by ICASSP 2020

  42. arXiv:2002.02859  [pdf, other

    eess.SP

    Harvest-and-Opportunistically-Relay: Analyses on Transmission Outage and Covertness

    Authors: Yuanjian Li, Rui Zhao, Zhiqiao Nie, A. Hamid Aghvami

    Abstract: For enhancing transmission performance, privacy level and energy manipulating efficiency of wireless networks, this paper initiates a novel simultaneous wireless information and power transfer (SWIPT) full-duplex (FD) relaying protocol, termed harvest-and-opportunistically-relay (HOR). In the proposed HOR protocol, the relay can work opportunistically in either pure energy harvesting (PEH) or the… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: Single-column 43-page long, 12 figures, pre-print version on arXiv before publication for sharing ideas timely

  43. arXiv:1909.12415  [pdf, other

    cs.CL eess.AS

    Improving RNN Transducer Modeling for End-to-End Speech Recognition

    Authors: Jinyu Li, Rui Zhao, Hu Hu, Yifan Gong

    Abstract: In the last few years, an emerging trend in automatic speech recognition research is the study of end-to-end (E2E) systems. Connectionist Temporal Classification (CTC), Attention Encoder-Decoder (AED), and RNN Transducer (RNN-T) are the most popular three methods. Among these three methods, RNN-T has the advantages to do online streaming which is challenging to AED and it doesn't have CTC's frame-… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: Accepted by IEEE ASRU workshop, 2019

  44. arXiv:1909.11936  [pdf

    eess.IV cs.CV

    A Refined Equilibrium Generative Adversarial Network for Retinal Vessel Segmentation

    Authors: Yukun Zhou, Zailiang Chen, Hailan Shen, Xianxian Zheng, Rongchang Zhao, Xuanchu Duan

    Abstract: Objective: Recognizing retinal vessel abnormity is vital to early diagnosis of ophthalmological diseases and cardiovascular events. However, segmentation results are highly influenced by elusive vessels, especially in low-contrast background and lesion region. In this work, we present an end-to-end synthetic neural network, containing a symmetric equilibrium generative adversarial network (SEGAN),… ▽ More

    Submitted 18 December, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: 12 pages, 8 figures, and 9 tables

  45. arXiv:1810.06055  [pdf, other

    eess.IV cs.CV

    A Simple Change Comparison Method for Image Sequences Based on Uncertainty Coefficient

    Authors: Ruzhang Zhao, Yajun Fang, Berthold K. P. Horn

    Abstract: For identification of change information in image sequences, most studies focus on change detection in one image sequence, while few studies have considered the change level comparison between two different image sequences. Moreover, most studies require the detection of image information in details, for example, object detection. Based on Uncertainty Coefficient(UC), this paper proposes an innova… ▽ More

    Submitted 14 October, 2018; originally announced October 2018.

    Comments: 5 pages, 5 figures, 2 tables, accepted as a conference paper at IEEE UV 2018, Boston, USA