-
Hierarchical Attention and Parallel Filter Fusion Network for Multi-Source Data Classification
Authors:
Han Luo,
Feng Gao,
Junyu Dong,
Lin Qi
Abstract:
Hyperspectral image (HSI) and synthetic aperture radar (SAR) data joint classification is a crucial and yet challenging task in the field of remote sensing image interpretation. However, feature modeling in existing methods is deficient to exploit the abundant global, spectral, and local features simultaneously, leading to sub-optimal classification performance. To solve the problem, we propose a…
▽ More
Hyperspectral image (HSI) and synthetic aperture radar (SAR) data joint classification is a crucial and yet challenging task in the field of remote sensing image interpretation. However, feature modeling in existing methods is deficient to exploit the abundant global, spectral, and local features simultaneously, leading to sub-optimal classification performance. To solve the problem, we propose a hierarchical attention and parallel filter fusion network for multi-source data classification. Concretely, we design a hierarchical attention module for hyperspectral feature extraction. This module integrates global, spectral, and local features simultaneously to provide more comprehensive feature representation. In addition, we develop parallel filter fusion module which enhances cross-modal feature interactions among different spatial locations in the frequency domain. Extensive experiments on two multi-source remote sensing data classification datasets verify the superiority of our proposed method over current state-of-the-art classification approaches. Specifically, our proposed method achieves 91.44% and 80.51% of overall accuracy (OA) on the respective datasets, highlighting its superior performance.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Convergence of Symbiotic Communications and Blockchain for Sustainable and Trustworthy 6G Wireless Networks
Authors:
Haoxiang Luo,
Gang Sun,
Cheng Chi,
Hongfang Yu,
Mohsen Guizani
Abstract:
Symbiotic communication (SC) is known as a new wireless communication paradigm, similar to the natural ecosystem population, and can enable multiple communication systems to cooperate and mutualize through service exchange and resource sharing. As a result, SC is seen as an important potential technology for future sixth-generation (6G) communications, solving the problem of lack of spectrum resou…
▽ More
Symbiotic communication (SC) is known as a new wireless communication paradigm, similar to the natural ecosystem population, and can enable multiple communication systems to cooperate and mutualize through service exchange and resource sharing. As a result, SC is seen as an important potential technology for future sixth-generation (6G) communications, solving the problem of lack of spectrum resources and energy inefficiency. Symbiotic relationships among communication systems can complement radio resources in 6G. However, the absence of established trust relationships among diverse communication systems presents a formidable hurdle in ensuring efficient and trusted resource and service exchange within SC frameworks. To better realize trusted SC services in 6G, in this paper, we propose a solution that converges SC and blockchain, called a symbiotic blockchain network (SBN). Specifically, we first use cognitive backscatter communication to transform blockchain consensus, that is, the symbiotic blockchain consensus (SBC), so that it can be better suited for the wireless network. Then, for SBC, we propose a highly energy-efficient sharding scheme to meet the extremely low power consumption requirements in 6G. Finally, such a blockchain scheme guarantees trusted transactions of communication services in SC. Through ablation experiments, our proposed SBN demonstrates significant efficacy in mitigating energy consumption and reducing processing latency in adversarial networks, which is expected to achieve a sustainable and trusted 6G wireless network.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection
Authors:
Jiahao Wang,
Mingxuan Li,
Haichen Luo,
Jinguo Zhu,
Aijun Yang,
Mingzhe Rong,
Xiaohua Wang
Abstract:
The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assista…
▽ More
The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assistant designed to offer professional and reliable inspection services for power transmission line by engaging in dialogues with humans. Moreover, we also construct a large-scale and high-quality dataset specialized for the inspection task. By employing a two-stage training strategy on the constructed dataset, Power-LLaVA demonstrates exceptional performance at a comparatively low training cost. Extensive experiments further prove the great capabilities of Power-LLaVA within the realm of power transmission line inspection. Code shall be released.
△ Less
Submitted 27 July, 2024;
originally announced July 2024.
-
6D Motion Parameters Estimation in Monostatic Integrated Sensing and Communications System
Authors:
Hongliang Luo,
Feifei Gao,
Fan Liu,
Shi Jin
Abstract:
In this paper, we propose a novel scheme to estimate the six dimensional (6D) motion parameters of dynamic target for monostatic integrated sensing and communications (ISAC) system. We first provide a generic ISAC framework for dynamic target sensing based on massive multiple input and multiple output (MIMO) array. Next, we derive the relationship between the sensing channel of ISAC base station (…
▽ More
In this paper, we propose a novel scheme to estimate the six dimensional (6D) motion parameters of dynamic target for monostatic integrated sensing and communications (ISAC) system. We first provide a generic ISAC framework for dynamic target sensing based on massive multiple input and multiple output (MIMO) array. Next, we derive the relationship between the sensing channel of ISAC base station (BS) and the 6D motion parameters of dynamic target. Then, we employ the array signal processing methods to estimate the horizontal angle, pitch angle, distance, and virtual velocity of dynamic target. Since the virtual velocities observed by different antennas are different, we adopt plane fitting to estimate the dynamic target's radial velocity, horizontal angular velocity, and pitch angular velocity from these virtual velocities. Simulation results demonstrate the effectiveness of the proposed 6D motion parameters estimation scheme, which also confirms a new finding that one single BS with massive MIMO array is capable of estimating the horizontal angular velocity and pitch angular velocity of dynamic target.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Authors:
Keyu An,
Qian Chen,
Chong Deng,
Zhihao Du,
Changfeng Gao,
Zhifu Gao,
Yue Gu,
Ting He,
Hangrui Hu,
Kai Hu,
Shengpeng Ji,
Yabin Li,
Zerui Li,
Heng Lu,
Haoneng Luo,
Xiang Lv,
Bin Ma,
Ziyang Ma,
Chongjia Ni,
Changhe Song,
Jiaqi Shi,
Xian Shi,
Hao Wang,
Wen Wang,
Yuxuan Wang
, et al. (8 additional authors not shown)
Abstract:
This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp…
▽ More
This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. SenseVoice-Small delivers exceptionally low-latency ASR for 5 languages, and SenseVoice-Large supports high-precision ASR for over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot in-context learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub. By integrating these models with LLMs, FunAudioLLM enables applications such as speech-to-speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration, thereby pushing the boundaries of voice interaction technology. Demos are available at https://fun-audio-llm.github.io, and the code can be accessed at https://github.com/FunAudioLLM.
△ Less
Submitted 10 July, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data
Authors:
Shan Cong,
Zhoujie Fan,
Hongwei Liu,
Yinghan Zhang,
Xin Wang,
Haoran Luo,
Xiaohui Yao
Abstract:
Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities,…
▽ More
Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities, most studies overlook the informativeness disparities between modalities. Here, we propose TMM, a trusted multiview multimodal graph attention framework for AD diagnosis, using extensive brain-wide transcriptomics and imaging data. First, we construct view-specific brain regional co-function networks (RRIs) from transcriptomics and multimodal radiomics data to incorporate interaction information from both biomolecular and imaging perspectives. Next, we apply graph attention (GAT) processing to each RRI network to produce graph embeddings and employ cross-modal attention to fuse transcriptomics-derived embedding with each imagingderived embedding. Finally, a novel true-false-harmonized class probability (TFCP) strategy is designed to assess and adaptively adjust the prediction confidence of each modality for AD diagnosis. We evaluate TMM using the AHBA database with brain-wide transcriptomics data and the ADNI database with three imaging modalities (AV45-PET, FDG-PET, and VBM-MRI). The results demonstrate the superiority of our method in identifying AD, EMCI, and LMCI compared to state-of-the-arts. Code and data are available at https://github.com/Yaolab-fantastic/TMM.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Integrated Sensing and Communications Framework for 6G Networks
Authors:
Hongliang Luo,
Tengyu Zhang,
Chuanbin Zhao,
Yucong Wang,
Bo Lin,
Yuhua Jiang,
Dongqi Luo,
Feifei Gao
Abstract:
In this paper, we propose a novel integrated sensing and communications (ISAC) framework for the sixth generation (6G) mobile networks, in which we decompose the real physical world into static environment, dynamic targets, and various object materials. The ubiquitous static environment occupies the vast majority of the physical world, for which we design static environment reconstruction (SER) sc…
▽ More
In this paper, we propose a novel integrated sensing and communications (ISAC) framework for the sixth generation (6G) mobile networks, in which we decompose the real physical world into static environment, dynamic targets, and various object materials. The ubiquitous static environment occupies the vast majority of the physical world, for which we design static environment reconstruction (SER) scheme to obtain the layout and point cloud information of static buildings. The dynamic targets floating in static environments create the spatiotemporal transition of the physical world, for which we design comprehensive dynamic target sensing (DTS) scheme to detect, estimate, track, image and recognize the dynamic targets in real-time. The object materials enrich the electromagnetic laws of the physical world, for which we develop object material recognition (OMR) scheme to estimate the electromagnetic coefficient of the objects. Besides, to integrate these sensing functions into existing communications systems, we discuss the interference issues and corresponding solutions for ISAC cellular networks. Furthermore, we develop an ISAC hardware prototype platform that can reconstruct the environmental maps and sense the dynamic targets while maintaining communications services. With all these designs, the proposed ISAC framework can support multifarious emerging applications, such as digital twins, low altitude economy, internet of vehicles, marine management, deformation monitoring, etc.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
"Pass the butter": A study on desktop-classic multitasking robotic arm based on advanced YOLOv7 and BERT
Authors:
Haohua Que,
Wenbin Pan,
Jie Xu,
Hao Luo,
Pei Wang,
Li Zhang
Abstract:
In recent years, various intelligent autonomous robots have begun to appear in daily life and production. Desktop-level robots are characterized by their flexible deployment, rapid response, and suitability for light workload environments. In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, l…
▽ More
In recent years, various intelligent autonomous robots have begun to appear in daily life and production. Desktop-level robots are characterized by their flexible deployment, rapid response, and suitability for light workload environments. In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, locally deploying a natural language model (NLP-BERT), and integrating visual recognition (CV-YOLO) and speech recognition technology (ASR-Whisper) as inputs to achieve autonomous decision-making and rational action by the desktop robot. Three comprehensive experiments were designed to validate the robotic arm, and the results demonstrate excellent performance using this approach across all three experiments. In Task 1, the execution rates for speech recognition and action performance were 92.6% and 84.3%, respectively. In Task 2, the highest execution rates under the given conditions reached 92.1% and 84.6%, while in Task 3, the highest execution rates were 95.2% and 80.8%, respectively. Therefore, it can be concluded that the proposed solution integrating ASR, NLP, and other technologies on edge devices is feasible and provides a technical and engineering foundation for realizing multimodal desktop-level robots.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Digital Twin Aided Compressive Sensing: Enabling Site-Specific MIMO Hybrid Precoding
Authors:
Hao Luo,
Ahmed Alkhateeb
Abstract:
Compressive sensing is a promising solution for the channel estimation in multiple-input multiple-output (MIMO) systems with large antenna arrays and constrained hardware. Utilizing site-specific channel data from real-world systems, deep learning can be employed to learn the compressive sensing measurement vectors with minimum redundancy, thereby focusing sensing power on promising spatial direct…
▽ More
Compressive sensing is a promising solution for the channel estimation in multiple-input multiple-output (MIMO) systems with large antenna arrays and constrained hardware. Utilizing site-specific channel data from real-world systems, deep learning can be employed to learn the compressive sensing measurement vectors with minimum redundancy, thereby focusing sensing power on promising spatial directions of the channel. Collecting real-world channel data, however, is challenging due to the high overhead resulting from the large number of antennas and hardware constraints. In this paper, we propose leveraging a site-specific digital twin to generate synthetic channel data, which shares a similar distribution with real-world data. The synthetic data is then used to train the deep learning models for learning measurement vectors and hybrid precoder/combiner design in an end-to-end manner. We further propose a model refinement approach to fine-tune the model pre-trained on the digital twin data with a small amount of real-world data. The evaluation results show that, by training the model on the digital twin data, the learned measurement vectors can be efficiently adapted to the environment geometry, leading to high performance of hybrid precoding for real-world deployments. Moreover, the model refinement approach can enable the digital twin aided model to achieve comparable performance to the model trained on the real-world dataset with a significantly reduced amount of real-world data.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Joint Power Allocation and Beamforming for In-band Full-duplex Multi-cell Multi-user Networks
Authors:
Haifeng Luo,
Navneet Garg,
Mark Holm,
Tharmalingam Ratnarajah
Abstract:
This paper investigates a robust joint power allocation and beamforming scheme for in-band full-duplex multi-cell multi-user (IBFD-MCMU) networks. A mean-squared error (MSE) minimization problem is formulated with constraints on the power budgets and residual self-interference (RSI) power. The problem is not convex, so we decompose it into two sub-problems: interference management beamforming and…
▽ More
This paper investigates a robust joint power allocation and beamforming scheme for in-band full-duplex multi-cell multi-user (IBFD-MCMU) networks. A mean-squared error (MSE) minimization problem is formulated with constraints on the power budgets and residual self-interference (RSI) power. The problem is not convex, so we decompose it into two sub-problems: interference management beamforming and power allocation, and give closed-form solutions to the sub-problems. Then we propose an iterative algorithm to yield an overall solution. The computational complexity and convergence behavior of the algorithm are analyzed. Our method can enhance the analog self-interference cancellation (ASIC) depth provided by the precoder with less effect on the downlink communication than the existing null-space projection method, inspiring a low-cost but efficient IBFD transceiver design. It can achieve 42.9% of IBFD gain in terms of spectral efficiency with only antenna isolation, while this value increases to 60.9% with further digital self-interference cancellation (DSIC). Numerical results illustrate that our algorithm is robust to hardware impairments and channel uncertainty. With sufficient ASIC depth, our method reduces the computation time by at least 20% than the existing scheme due to its faster convergence speed at the cost of < 12.5% sum rate loss. The benefit is much more significant with single-antenna users that our algorithm saves at least 40% of the computation time at the cost of < 10% sum rate reduction.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
On the Secrecy Rate of In-Band Full-duplex Two-way Wiretap Channel
Authors:
Navneet Garg,
Haifeng Luo,
Tharmalingam Ratnarajah
Abstract:
In this paper, we consider a two-way wiretap Multi-Input Multi-Output Multi-antenna Eve (MIMOME) channel, where both nodes (Alice and Bob) transmit and receive in an in-band full-duplex (IBFD) manner. For this system with keyless security, we provide a novel artificial noise (AN) based signal design, where the AN is injected in both signal and null spaces. We present an ergodic secrecy rate approx…
▽ More
In this paper, we consider a two-way wiretap Multi-Input Multi-Output Multi-antenna Eve (MIMOME) channel, where both nodes (Alice and Bob) transmit and receive in an in-band full-duplex (IBFD) manner. For this system with keyless security, we provide a novel artificial noise (AN) based signal design, where the AN is injected in both signal and null spaces. We present an ergodic secrecy rate approximation to derive the power allocation algorithm. We consider scenarios where AN is known and unknown to legitimate users and include imperfect channel information effects. To maximize secrecy rates subject to the transmit power constraint, a two-step power allocation solution is proposed, where the first step is known at Eve, and the second step helps to improve the secrecy further. We also consider scenarios where partial information is known by Eve and the effects of non-ideal self-interference cancellation. The usefulness and limitations of the resulting power allocation solution are analyzed and verified via simulations. Results show that secrecy rates are less when AN is unknown to receivers or Eve has more information about legitimate users. Since the ergodic approximation only considers Eves distance, the resulting power allocation provides secrecy rates close to the actual ones.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Reinforcement Learning Based Robust Volt/Var Control in Active Distribution Networks With Imprecisely Known Delay
Authors:
Hong Cheng,
Huan Luo,
Zhi Liu,
Wei Sun,
Weitao Li,
Qiyue Li
Abstract:
Active distribution networks (ADNs) incorporating massive photovoltaic (PV) devices encounter challenges of rapid voltage fluctuations and potential violations. Due to the fluctuation and intermittency of PV generation, the state gap, arising from time-inconsistent states and exacerbated by imprecisely known system delays, significantly impacts the accuracy of voltage control. This paper addresses…
▽ More
Active distribution networks (ADNs) incorporating massive photovoltaic (PV) devices encounter challenges of rapid voltage fluctuations and potential violations. Due to the fluctuation and intermittency of PV generation, the state gap, arising from time-inconsistent states and exacerbated by imprecisely known system delays, significantly impacts the accuracy of voltage control. This paper addresses this challenge by introducing a framework for delay adaptive Volt/Var control (VVC) in the presence of imprecisely known system delays to regulate the reactive power of PV inverters. The proposed approach formulates the voltage control, based on predicted system operation states, as a robust VVC problem. It employs sample selection from the state prediction interval to promptly identify the worst-performing system operation state. Furthermore, we leverage the decentralized partially observable Markov decision process (Dec-POMDP) to reformulate the robust VVC problem. We design Multiple Policy Networks and employ Multiple Policy Networks and Reward Shaping-based Multi-agent Twin Delayed Deep Deterministic Policy Gradient (MPNRS-MATD3) algorithm to efficiently address and solve the Dec-POMDP model-based problem. Simulation results show the delay adaption characteristic of our proposed framework, and the MPNRS-MATD3 outperforms other multi-agent reinforcement learning algorithms in robust voltage control.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Integrated Imaging and Communication with Reconfigurable Intelligent Surfaces
Authors:
Hao Luo,
Ahmed Alkhateeb
Abstract:
Reconfigurable intelligent surfaces, with their large number of antennas, offer an interesting opportunity for high spatial-resolution imaging. In this paper, we propose a novel RIS-aided integrated imaging and communication system that can reduce the RIS beam training overhead for communication by leveraging the imaging of the surrounding environment. In particular, using the RIS as a wireless im…
▽ More
Reconfigurable intelligent surfaces, with their large number of antennas, offer an interesting opportunity for high spatial-resolution imaging. In this paper, we propose a novel RIS-aided integrated imaging and communication system that can reduce the RIS beam training overhead for communication by leveraging the imaging of the surrounding environment. In particular, using the RIS as a wireless imaging device, our system constructs the scene depth map of the environment, including the mobile user. Then, we develop a user detection algorithm that subtracts the background and extracts the mobile user attributes from the depth map. These attributes are then utilized to design the RIS interaction vector and the beam selection strategy with low overhead. Simulation results show that the proposed approach can achieve comparable beamforming gain to the optimal/exhaustive beam selection solution while requiring 1000 times less beam training overhead.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
ISAC with Backscattering RFID Tags: Joint Beamforming Design
Authors:
Hao Luo,
Umut Demirhan,
Ahmed Alkhateeb
Abstract:
In this paper, we explore an integrated sensing and communication (ISAC) system with backscattering RFID tags. In this setup, an access point employs a communication beam to serve a user while leveraging a sensing beam to detect an RFID tag. Under the total transmit power constraint of the system, our objective is to design sensing and communication beams by considering the tag detection and commu…
▽ More
In this paper, we explore an integrated sensing and communication (ISAC) system with backscattering RFID tags. In this setup, an access point employs a communication beam to serve a user while leveraging a sensing beam to detect an RFID tag. Under the total transmit power constraint of the system, our objective is to design sensing and communication beams by considering the tag detection and communication requirements. First, we adopt zero-forcing to design the beamforming vectors, followed by solving a convex optimization problem to determine the power allocation between sensing and communication. Then, we study a joint beamforming design problem with the goal of minimizing the total transmit power while satisfying the tag detection and communication requirements. To resolve this, we re-formulate the non-convex constraints into convex second-order cone constraints. The simulation results demonstrate that, under different communication SINR requirements, joint beamforming optimization outperforms the zero-forcing-based method in terms of achievable detection distance, offering a promising approach for the ISAC-backscattering systems.
△ Less
Submitted 31 January, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
6D Radar Sensing and Tracking in Monostatic Integrated Sensing and Communications System
Authors:
Hongliang Luo,
Feifei Gao,
Fan Liu,
Shi Jin
Abstract:
In this paper, we propose a novel scheme for sixdimensional (6D) radar sensing and tracking of dynamic target based on multiple input and multiple output (MIMO) array for monostatic integrated sensing and communications (ISAC) system. Unlike most existing ISAC studies believing that only the radial velocity of far-field dynamic target can be measured based on one single base station (BS), we find…
▽ More
In this paper, we propose a novel scheme for sixdimensional (6D) radar sensing and tracking of dynamic target based on multiple input and multiple output (MIMO) array for monostatic integrated sensing and communications (ISAC) system. Unlike most existing ISAC studies believing that only the radial velocity of far-field dynamic target can be measured based on one single base station (BS), we find that the sensing echo channel of MIMO-ISAC system actually includes the distance, horizontal angle, pitch angle, radial velocity, horizontal angular velocity, and pitch angular velocity of the dynamic target. Thus we may fully rely on one single BS to estimate the dynamic target's 6D motion parameters from the sensing echo signals. Specifically, we first propose the long-term motion and short-term motion model of dynamic target, in which the short-term motion model serves the single-shot sensing of dynamic target, while the long-term motion model serves multiple-shots tracking of dynamic target. As a step further, we derive the sensing channel model corresponding to the short-term motion. Next, for singleshot sensing, we employ the array signal processing methods to estimate the dynamic target's horizontal angle, pitch angle, distance, and virtual velocity. By realizing that the virtual velocities observed by different antennas are different, we adopt plane fitting to estimate the radial velocity, horizontal angular velocity, and pitch angular velocity of dynamic target. Furthermore, we implement the multiple-shots tracking of dynamic target based on each single-shot sensing results and Kalman filtering. Simulation results demonstrate the effectiveness of the proposed 6D radar sensing and tracking scheme.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Moving Target Sensing for ISAC Systems in Clutter Environment
Authors:
Dongqi Luo,
Huihui Wu,
Hongliang Luo,
Bo Lin,
Feifei Gao
Abstract:
In this paper, we consider the moving target sensing problem for integrated sensing and communication (ISAC) systems in clutter environment. Scatterers produce strong clutter, deteriorating the performance of ISAC systems in practice. Given that scatterers are typically stationary and the targets of interest are usually moving, we here focus on sensing the moving targets. Specifically, we adopt a…
▽ More
In this paper, we consider the moving target sensing problem for integrated sensing and communication (ISAC) systems in clutter environment. Scatterers produce strong clutter, deteriorating the performance of ISAC systems in practice. Given that scatterers are typically stationary and the targets of interest are usually moving, we here focus on sensing the moving targets. Specifically, we adopt a scanning beam to search for moving target candidates. For the received signal in each scan, we employ high-pass filtering in the Doppler domain to suppress the clutter within the echo, thereby identifying candidate moving targets according to the power of filtered signal. Then, we adopt root-MUSIC-based algorithms to estimate the angle, range, and radial velocity of these candidate moving targets. Subsequently, we propose a target detection algorithm to reject false targets. Simulation results validate the effectiveness of these proposed methods.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Integrated Sensing and Communications in Clutter Environment
Authors:
Hongliang Luo,
Yucong Wang,
Dongqi Luo,
Jianwei Zhao,
Huihui Wu,
Shaodan Ma,
Feifei Gao
Abstract:
In this paper, we propose a practical integrated sensing and communications (ISAC) framework to sense dynamic targets from clutter environment while ensuring users communications quality. To implement communications function and sensing function simultaneously, we design multiple communications beams that can communicate with the users as well as one sensing beam that can rotate and scan the entir…
▽ More
In this paper, we propose a practical integrated sensing and communications (ISAC) framework to sense dynamic targets from clutter environment while ensuring users communications quality. To implement communications function and sensing function simultaneously, we design multiple communications beams that can communicate with the users as well as one sensing beam that can rotate and scan the entire space. To minimize the interference of sensing beam on existing communications systems, we divide the service area into sensing beam for sensing (S4S) sector and communications beam for sensing (C4S) sector, and provide beamforming design and power allocation optimization strategies for each type sector. Unlike most existing ISAC studies that ignore the interference of static environmental clutter on target sensing, we construct a mixed sensing channel model that includes both static environment and dynamic targets. When base station receives the echo signals, the mean phasor cancellation (MPC) method is employed to filter out the interference from static environmental clutter and to extract the effective dynamic target echoes. Then a complete and practical dynamic target sensing scheme is designed to detect the presence of dynamic targets and to estimate their angles, distances, and velocities. In particular, dynamic target detection and angle estimation are realized through angle-Doppler spectrum estimation (ADSE) and joint detection over multiple subcarriers (MSJD), while distance and velocity estimation are realized through the extended subspace algorithm. Simulation results demonstrate the effectiveness of the proposed scheme and its superiority over the existing methods that ignore environmental clutter.
△ Less
Submitted 5 February, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Deep Learning Enables Large Depth-of-Field Images for Sub-Diffraction-Limit Scanning Superlens Microscopy
Authors:
Hui Sun,
Hao Luo,
Feifei Wang,
Qingjiu Chen,
Meng Chen,
Xiaoduo Wang,
Haibo Yu,
Guanglie Zhang,
Lianqing Liu,
Jianping Wang,
Dapeng Wu,
Wen Jung Li
Abstract:
Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the mapping relationship between op…
▽ More
Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the mapping relationship between optical super-resolution (OSR) images and SEM domain images, which enables the transformation of OSR images into SEM-like large depth-of-field images. Our custom-built scanning superlens microscopy (SSUM) system, which requires neither coating samples by conductive films nor a vacuum environment, is used to acquire the OSR images with features down to ~80 nm. The peak signal-to-noise ratio (PSNR) and structural similarity index measure values indicate that the deep learning method performs excellently in image-to-image translation, with a PSNR improvement of about 0.74 dB over the optical super-resolution images. The proposed method provides a high level of detail in the reconstructed results, indicating that it has broad applicability to chip-level defect detection, biological sample analysis, forensics, and various other fields.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Phase Synchrony Component Self-Organization in Brain Computer Interface
Authors:
Xu Niu,
Na Lu,
Huan Luo,
Ruofan Yan
Abstract:
Phase synchrony information plays a crucial role in analyzing functional brain connectivity and identifying brain activities. A widely adopted feature extraction pipeline, composed of preprocessing, selection of EEG acquisition channels, and phase locking value (PLV) calculation, has achieved success in motor imagery classification (MI). However, this pipeline is manual and reliant on expert knowl…
▽ More
Phase synchrony information plays a crucial role in analyzing functional brain connectivity and identifying brain activities. A widely adopted feature extraction pipeline, composed of preprocessing, selection of EEG acquisition channels, and phase locking value (PLV) calculation, has achieved success in motor imagery classification (MI). However, this pipeline is manual and reliant on expert knowledge, limiting its convenience and adaptability to different application scenarios. Moreover, most studies have employed mediocre data-independent spatial filters to suppress noise, impeding the exploration of more significant phase synchronization phenomena. To address the issues, we propose the concept of phase synchrony component self-organization, which enables the adaptive learning of data-dependent spatial filters for automating both the preprocessing and channel selection procedures. Based on this concept, the first deep learning end-to-end network is developed, which directly extracts phase synchrony-based features from raw EEG signals and perform classification. The network learns optimal filters during training, which are obtained when the network achieves peak classification results. Extensive experiments have demonstrated that our network outperforms state-of-the-art methods. Remarkably, through the learned optimal filters, significant phase synchronization phenomena can be observed. Specifically, by calculating the PLV between a pair of signals extracted from each sample using two of the learned spatial filters, we have obtained an average PLV exceeding 0.87 across all tongue MI samples. This high PLV indicates a groundbreaking discovery in the synchrony pattern of tongue MI.
△ Less
Submitted 11 October, 2023; v1 submitted 21 September, 2023;
originally announced October 2023.
-
Joint Audio and Speech Understanding
Authors:
Yuan Gong,
Alexander H. Liu,
Hongyin Luo,
Leonid Karlinsky,
James Glass
Abstract:
Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perce…
▽ More
Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perception and advanced reasoning ability. Specifically, by integrating Whisper as a perception module and LLaMA as a reasoning module, LTU-AS can simultaneously recognize and jointly understand spoken text, speech paralinguistics, and non-speech audio events - almost everything perceivable from audio signals.
△ Less
Submitted 10 December, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Beam Squint Assisted User Localization in Near-Field Integrated Sensing and Communications Systems
Authors:
Hongliang Luo,
Feifei Gao,
Wanmai Yuan,
Shun Zhang
Abstract:
Integrated sensing and communication (ISAC) has been regarded as a key technology for 6G wireless communications, in which large-scale multiple input and multiple output (MIMO) array with higher and wider frequency bands will be adopted. However, recent studies show that the beam squint phenomenon can not be ignored in wideband MIMO system, which generally deteriorates the communications performan…
▽ More
Integrated sensing and communication (ISAC) has been regarded as a key technology for 6G wireless communications, in which large-scale multiple input and multiple output (MIMO) array with higher and wider frequency bands will be adopted. However, recent studies show that the beam squint phenomenon can not be ignored in wideband MIMO system, which generally deteriorates the communications performance. In this paper, we find that with the aid of true-time-delay lines (TTDs), the range and trajectory of the beam squint in near-field communications systems can be freely controlled, and hence it is possible to reversely utilize the beam squint for user localization. We derive the trajectory equation for near-field beam squint points and design a way to control such trajectory. With the proposed design, beamforming from different subcarriers would purposely point to different angles and different distances, such that users from different positions would receive the maximum power at different subcarriers. Hence, one can simply localize multiple users from the beam squint effect in frequency domain, and thus reduce the beam sweeping overhead as compared to the conventional time domain beam search based approach. Furthermore, we utilize the phase difference of the maximum power subcarriers received by the user at different frequencies in several times beam sweeping to obtain a more accurate distance estimation result, ultimately realizing high accuracy and low beam sweeping overhead user localization. Simulation results demonstrate the effectiveness of the proposed schemes.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Millimeter Wave V2V Beam Tracking using Radar: Algorithms and Real-World Demonstration
Authors:
Hao Luo,
Umut Demirhan,
Ahmed Alkhateeb
Abstract:
Utilizing radar sensing for assisting communication has attracted increasing interest thanks to its potential in dynamic environments. A particularly interesting problem for this approach appears in the vehicle-to-vehicle (V2V) millimeter wave and terahertz communication scenarios, where the narrow beams change with the movement of both vehicles. To address this problem, in this work, we develop a…
▽ More
Utilizing radar sensing for assisting communication has attracted increasing interest thanks to its potential in dynamic environments. A particularly interesting problem for this approach appears in the vehicle-to-vehicle (V2V) millimeter wave and terahertz communication scenarios, where the narrow beams change with the movement of both vehicles. To address this problem, in this work, we develop a radar-aided beam-tracking framework, where a single initial beam and a set of radar measurements over a period of time are utilized to predict the future beams after this time duration. Within this framework, we develop two approaches with the combination of various degrees of radar signal processing and machine learning. To evaluate the feasibility of the solutions in a realistic scenario, we test their performance on a real-world V2V dataset. Our results indicated the importance of high angular resolution radar for this task and affirmed the potential of using radar for the V2V beam management problems.
△ Less
Submitted 27 October, 2023; v1 submitted 3 August, 2023;
originally announced August 2023.
-
YOLO: An Efficient Terahertz Band Integrated Sensing and Communications Scheme with Beam Squint
Authors:
Hongliang Luo,
Feifei Gao,
Hai Lin,
Shaodan Ma,
H. Vincent Poor
Abstract:
Using communications signals for dynamic target sensing is an important component of integrated sensing and communications (ISAC). In this paper, we propose to utilize the beam squint effect to realize fast non-cooperative dynamic target sensing in massive multiple input and multiple output (MIMO) Terahertz band communications systems. Specifically, we construct a wideband channel model of the ech…
▽ More
Using communications signals for dynamic target sensing is an important component of integrated sensing and communications (ISAC). In this paper, we propose to utilize the beam squint effect to realize fast non-cooperative dynamic target sensing in massive multiple input and multiple output (MIMO) Terahertz band communications systems. Specifically, we construct a wideband channel model of the echo signals, and design a beamforming strategy that controls the range of beam squint by adjusting the values of phase shifters and true time delay lines. With this design, beams at different subcarriers can be aligned along different directions in a planned way. Then the received echo signals at different subcarriers will carry target information in different directions, based on which the targets' angles can be estimated through sophisticatedly designed algorithm. Moreover, we propose a supporting method based on extended array signal estimation, which utilizes the phase changes of different frequency subcarriers within different OFDM symbols to estimate the distance and velocity of dynamic targets. Interestingly, the proposed sensing scheme only needs to transmit and receive the signals once, which can be termed as You Only Listen Once (YOLO). Compared with the traditional ISAC method that requires time consuming beam sweeping, the proposed one greatly reduces the sensing overhead. Simulation results are provided to demonstrate the effectiveness of the proposed scheme.
△ Less
Submitted 5 February, 2024; v1 submitted 19 May, 2023;
originally announced May 2023.
-
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Authors:
Zhifu Gao,
Zerui Li,
Jiaming Wang,
Haoneng Luo,
Xian Shi,
Mengzhe Chen,
Yabin Li,
Lingyun Zuo,
Zhihao Du,
Zhangyu Xiao,
Shiliang Zhang
Abstract:
This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications. The toolkit's flagship model, Paraformer, is a non-autoregressive end-to-end speech recognition model that has been trained on a manual…
▽ More
This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications. The toolkit's flagship model, Paraformer, is a non-autoregressive end-to-end speech recognition model that has been trained on a manually annotated Mandarin speech recognition dataset that contains 60,000 hours of speech. To improve the performance of Paraformer, we have added timestamp prediction and hotword customization capabilities to the standard Paraformer backbone. In addition, to facilitate model deployment, we have open-sourced a voice activity detection model based on the Feedforward Sequential Memory Network (FSMN-VAD) and a text post-processing punctuation model based on the controllable time-delay Transformer (CT-Transformer), both of which were trained on industrial corpora. These functional modules provide a solid foundation for building high-precision long audio speech recognition services. Compared to other models trained on open datasets, Paraformer demonstrates superior performance.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Listen, Think, and Understand
Authors:
Yuan Gong,
Hongyin Luo,
Alexander H. Liu,
Leonid Karlinsky,
James Glass
Abstract:
The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is crucial for many applications. Although significant progress has been made in this area since the development of AudioSet, most existing models are designed to map audio inputs to pre-defined, discrete sound label sets. In contrast, humans possess the ability to not only classify sounds into general cat…
▽ More
The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is crucial for many applications. Although significant progress has been made in this area since the development of AudioSet, most existing models are designed to map audio inputs to pre-defined, discrete sound label sets. In contrast, humans possess the ability to not only classify sounds into general categories, but also to listen to the finer details of the sounds, explain the reason for the predictions, think about what the sound infers, and understand the scene and what action needs to be taken, if any. Such capabilities beyond perception are not yet present in existing audio models. On the other hand, modern large language models (LLMs) exhibit emerging reasoning ability but they lack audio perception capabilities. Therefore, we ask the question: can we build a model that has both audio perception and a reasoning ability?
In this paper, we propose a new audio foundation model, called LTU (Listen, Think, and Understand). To train LTU, we created a new OpenAQA-5M dataset consisting of 1.9 million closed-ended and 3.7 million open-ended, diverse (audio, question, answer) tuples, and have used an autoregressive training framework with a perception-to-understanding curriculum. LTU demonstrates strong performance and generalization ability on conventional audio tasks such as classification and captioning. More importantly, it exhibits emerging audio reasoning and comprehension abilities that are absent in existing audio models. To the best of our knowledge, LTU is one of the first multimodal large language models that focus on general audio (rather than just speech) understanding.
△ Less
Submitted 19 February, 2024; v1 submitted 18 May, 2023;
originally announced May 2023.
-
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
Authors:
Xian Shi,
Haoneng Luo,
Zhifu Gao,
Shiliang Zhang,
Zhijie Yan
Abstract:
Estimating confidence scores for recognition results is a classic task in ASR field and of vital importance for kinds of downstream tasks and training strategies. Previous end-to-end~(E2E) based confidence estimation models (CEM) predict score sequences of equal length with input transcriptions, leading to unreliable estimation when deletion and insertion errors occur. In this paper we proposed CI…
▽ More
Estimating confidence scores for recognition results is a classic task in ASR field and of vital importance for kinds of downstream tasks and training strategies. Previous end-to-end~(E2E) based confidence estimation models (CEM) predict score sequences of equal length with input transcriptions, leading to unreliable estimation when deletion and insertion errors occur. In this paper we proposed CIF-Aligned confidence estimation model (CA-CEM) to achieve accurate and reliable confidence estimation based on novel non-autoregressive E2E ASR model - Paraformer. CA-CEM utilizes the modeling character of continuous integrate-and-fire (CIF) mechanism to generate token-synchronous acoustic embedding, which solves the estimation failure issue above. We measure the quality of estimation with AUC and RMSE in token level and ECE-U - a proposed metrics in utterance level. CA-CEM gains 24% and 19% relative reduction on ECE-U and also better AUC and RMSE on two test sets. Furthermore, we conduct analysis to explore the potential of CEM for different ASR related usage.
△ Less
Submitted 24 May, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Inflation Reduction Act impacts on the economics of clean hydrogen and liquid fuels
Authors:
Fangwei Cheng,
Hongxi Luo,
Jesse D. Jenkins,
Eric D. Larson
Abstract:
The Inflation Reduction Act (IRA) in the United States provides unprecedented incentives for deploying low-carbon hydrogen and liquid fuels, among other low greenhouse gas (GHG) emissions technologies. To better understand the prospective competitiveness of low-carbon or negative-carbon hydrogen and liquid fuels under the IRA in the early 2030s, we examine the impacts of IRA provisions on costs of…
▽ More
The Inflation Reduction Act (IRA) in the United States provides unprecedented incentives for deploying low-carbon hydrogen and liquid fuels, among other low greenhouse gas (GHG) emissions technologies. To better understand the prospective competitiveness of low-carbon or negative-carbon hydrogen and liquid fuels under the IRA in the early 2030s, we examine the impacts of IRA provisions on costs of producing hydrogen and synthetic liquid fuel made from natural gas, electricity, short-cycle biomass (agricultural residues), and corn-ethanol. With IRA credits (45V or 45Q), but excluding incentives provided by other national or state policies, hydrogen produced by electrolysis using carbon-free electricity (green H2) and natural gas reforming with carbon capture and storage (CCS) (blue H2) are cost-competitive with the carbon-intensive benchmark gray H2 from steam methane reforming. Biomass-derived H2 with or without CCS is not cost-completive under current IRA provisions. However, if IRA allowed biomass gasification with CCS to claim a 45V credit for carbon-neutral H2 and a 45Q credit for negative biogenic-CO2 emissions, this pathway would be less costly than gray H2. The IRA credit for clean fuels (45Z), currently stipulated to end in 2027, would need to be extended, or similar policy support provided by other national or state policies, for clean synthetic liquid fuel to be cost-competitive with petroleum-derived liquid fuels. Levelized IRA subsidies per unit of CO2 mitigated for all hydrogen and synthetic liquid fuel production pathways, except electricity-derived synthetic liquid fuel, range from 65 to 384 $/t CO2, which is within or below the range in U.S. federal government estimates of the Social Cost of Carbon (SCC) in the 2030 to 2040 timeframe.
△ Less
Submitted 14 August, 2023; v1 submitted 1 May, 2023;
originally announced May 2023.
-
ESCM: An Efficient and Secure Communication Mechanism for UAV Networks
Authors:
Haoxiang Luo,
Yifan Wu,
Gang Sun,
Hongfang Yu,
Mohsen Guizani
Abstract:
UAV (unmanned aerial vehicle) is rapidly gaining traction in various human activities and has become an integral component of the satellite-air-ground-sea (SAGS) integrated network. As high-speed moving objects, UAVs not only have extremely strict requirements for communication delay, but also cannot be maliciously controlled as a weapon by the attacker. Therefore, an efficient and secure communic…
▽ More
UAV (unmanned aerial vehicle) is rapidly gaining traction in various human activities and has become an integral component of the satellite-air-ground-sea (SAGS) integrated network. As high-speed moving objects, UAVs not only have extremely strict requirements for communication delay, but also cannot be maliciously controlled as a weapon by the attacker. Therefore, an efficient and secure communication method designed for UAV networks is necessary. We propose a communication mechanism ESCM. For high efficiency, ESCM provides a routing protocol based on the artificial bee colony (ABC) algorithm to accelerate communications between UAVs. Meanwhile, we use blockchain to guarantee the security of UAV networks. However, blockchain has unstable links in high-mobility networks resulting in low consensus efficiency and high communication overhead. Consequently, ESCM introduces digital twin (DT), which transforms the UAV network into a static network by mapping UAVs from the physical world into Cyberspace. This virtual UAV network is called CyberUAV. Then, in CyberUAV, we design a blockchain consensus based on network coding, named Proof of Network Coding (PoNC). Analysis and simulation show that the above modules in ESCM have advantages over existing schemes. Through ablation studies, we demonstrate that these modules are indispensable for efficient and secure communication of UAV networks.
△ Less
Submitted 16 June, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Performance Analysis and Comparison of Non-ideal Wireless PBFT and RAFT Consensus Networks in 6G Communications
Authors:
Haoxiang Luo,
Xiangyue Yang,
Hongfang Yu,
Gang Sun,
Bo Lei,
Mohsen Guizani
Abstract:
Due to advantages in security and privacy, blockchain is considered a key enabling technology to support 6G communications. Practical Byzantine Fault Tolerance (PBFT) and RAFT are seen as the most applicable consensus mechanisms (CMs) in blockchain-enabled wireless networks. However, previous studies on PBFT and RAFT rarely consider the channel performance of the physical layer, such as path loss…
▽ More
Due to advantages in security and privacy, blockchain is considered a key enabling technology to support 6G communications. Practical Byzantine Fault Tolerance (PBFT) and RAFT are seen as the most applicable consensus mechanisms (CMs) in blockchain-enabled wireless networks. However, previous studies on PBFT and RAFT rarely consider the channel performance of the physical layer, such as path loss and channel fading, resulting in research results that are far from real networks. Additionally, 6G communications will widely deploy high-frequency signals such as terahertz (THz) and millimeter wave (mmWave), while performances of PBFT and RAFT are still unknown when these signals are transmitted in wireless PBFT or RAFT networks. Therefore, it is urgent to study the performance of non-ideal wireless PBFT and RAFT networks with THz and mmWave signals, to better make PBFT and RAFT play a role in the 6G era. In this paper, we study and compare the performance of THz and mmWave signals in non-ideal wireless PBFT and RAFT networks, considering Rayleigh Fading (RF) and close-in Free Space (FS) reference distance path loss. Performance is evaluated by five metrics: consensus success rate, latency, throughput, reliability gain, and energy consumption. Meanwhile, we find and derive that there is a maximum distance between two nodes that can make CMs inevitably successful, and it is named the active distance of CMs. The research results analyze the performance of non-ideal wireless PBFT and RAFT networks, and provide important references for the future transmission of THz and mmWave signals in PBFT and RAFT networks.
△ Less
Submitted 2 August, 2023; v1 submitted 17 April, 2023;
originally announced April 2023.
-
Performance Analysis of Non-ideal Wireless PBFT Networks with mmWave and Terahertz Signals
Authors:
Haoxiang Luo,
Xiangyue Yang,
Hongfang Yu,
Gang Sun,
Shizhong Xu,
Long Luo
Abstract:
Due to advantages in security and privacy, blockchain is considered a key enabling technology to support 6G communications. Practical Byzantine Fault Tolerance (PBFT) is seen as the most applicable consensus mechanism in blockchain-enabled wireless networks. However, previous studies on PBFT do not consider the channel performance of the physical layer, such as path loss and channel fading, result…
▽ More
Due to advantages in security and privacy, blockchain is considered a key enabling technology to support 6G communications. Practical Byzantine Fault Tolerance (PBFT) is seen as the most applicable consensus mechanism in blockchain-enabled wireless networks. However, previous studies on PBFT do not consider the channel performance of the physical layer, such as path loss and channel fading, resulting in research results that are far from real networks. Additionally, 6G communications will widely deploy high frequency signals such as millimeter wave (mmWave) and terahertz (THz), while the performance of PBFT is still unknown when these signals are transmitted in wireless PBFT networks. Therefore, it is urgent to study the performance of non-ideal wireless PBFT networks with mmWave and THz siganls, so as to better make PBFT play a role in 6G era. In this paper, we study and compare the performance of mmWave and THz signals in non-ideal wireless PBFT networks, considering Rayleigh Fading (RF) and close-in Free Space (FS) reference distance path loss. Performance is evaluated by consensus success rate and delay. Meanwhile, we find and derive that there is a maximum distance between two nodes that can make PBFT consensus inevitably successful, and it is named active distance of PBFT in this paper. The research results not only analyze the performance of non-ideal wireless PBFT networks, but also provide an important reference for the future transmission of mmWave and THz signals in PBFT networks.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
RIS-Aided Integrated Sensing and Communication: Joint Beamforming and Reflection Design
Authors:
Honghao Luo,
Rang Liu,
Ming Li,
Qian Liu
Abstract:
Integrated sensing and communication (ISAC) has been envisioned as a promising technique to alleviate the spectrum congestion problem. Inspired by the applications of reconfigurable intelligent surface (RIS) in dynamically manipulating wireless propagation environment, in this paper, we investigate to deploy a RIS in an ISAC system to pursue performance improvement. Particularly, we consider a RIS…
▽ More
Integrated sensing and communication (ISAC) has been envisioned as a promising technique to alleviate the spectrum congestion problem. Inspired by the applications of reconfigurable intelligent surface (RIS) in dynamically manipulating wireless propagation environment, in this paper, we investigate to deploy a RIS in an ISAC system to pursue performance improvement. Particularly, we consider a RIS-assisted ISAC system where a multi-antenna base station (BS) performs multi-target detection and multi-user communication with the assistance of a RIS. Our goal is maximizing the weighted summation of target detection signal-to-noise ratios (SNRs) by jointly optimizing the transmit beamforming and the RIS reflection coefficients, while satisfying the communication quality-of-service (QoS) requirement, the total transmit power budget, and the restriction of RIS phase-shift. An efficient alternating optimization algorithm combining the majorization-minimization (MM), penalty-based, and manifold optimization methods is developed to solve the resulting complicated non-convex optimization problem. Simulation results illustrate the advantages of deploying RIS in ISAC systems and the effectiveness of our proposed algorithm.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Interpretable Spectrum Transformation Attacks to Speaker Recognition
Authors:
Jiadi Yao,
Hong Luo,
Xiao-Lei Zhang
Abstract:
The success of adversarial attacks to speaker recognition is mainly in white-box scenarios. When applying the adversarial voices that are generated by attacking white-box surrogate models to black-box victim models, i.e. \textit{transfer-based} black-box attacks, the transferability of the adversarial voices is not only far from satisfactory, but also lacks interpretable basis. To address these is…
▽ More
The success of adversarial attacks to speaker recognition is mainly in white-box scenarios. When applying the adversarial voices that are generated by attacking white-box surrogate models to black-box victim models, i.e. \textit{transfer-based} black-box attacks, the transferability of the adversarial voices is not only far from satisfactory, but also lacks interpretable basis. To address these issues, in this paper, we propose a general framework, named spectral transformation attack based on modified discrete cosine transform (STA-MDCT), to improve the transferability of the adversarial voices to a black-box victim model. Specifically, we first apply MDCT to the input voice. Then, we slightly modify the energy of different frequency bands for capturing the salient regions of the adversarial noise in the time-frequency domain that are critical to a successful attack. Unlike existing approaches that operate voices in the time domain, the proposed framework operates voices in the time-frequency domain, which improves the interpretability, transferability, and imperceptibility of the attack. Moreover, it can be implemented with any gradient-based attackers. To utilize the advantage of model ensembling, we not only implement STA-MDCT with a single white-box surrogate model, but also with an ensemble of surrogate models. Finally, we visualize the saliency maps of adversarial voices by the class activation maps (CAM), which offers an interpretable basis to transfer-based attacks in speaker recognition for the first time. Extensive comparison results with five representative attackers show that the CAM visualization clearly explains the effectiveness of STA-MDCT, and the weaknesses of the comparison methods; the proposed method outperforms the comparison methods by a large margin.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Incipient Fault Detection in Power Distribution System: A Time-Frequency Embedded Deep Learning Based Approach
Authors:
Qiyue Li,
Huan Luo,
Hong Cheng,
Yuxing Deng,
Wei Sun,
Weitao Li,
Zhi Liu
Abstract:
Incipient fault detection in power distribution systems is crucial to improve the reliability of the grid. However, the non-stationary nature and the inadequacy of the training dataset due to the self-recovery of the incipient fault signal, make the incipient fault detection in power distribution systems a great challenge. In this paper, we focus on incipient fault detection in power distribution…
▽ More
Incipient fault detection in power distribution systems is crucial to improve the reliability of the grid. However, the non-stationary nature and the inadequacy of the training dataset due to the self-recovery of the incipient fault signal, make the incipient fault detection in power distribution systems a great challenge. In this paper, we focus on incipient fault detection in power distribution systems and address the above challenges. In particular, we propose an ADaptive Time-Frequency Memory(AD-TFM) cell by embedding wavelet transform into the Long Short-Term Memory (LSTM), to extract features in time and frequency domain from the non-stationary incipient fault signals.We make scale parameters and translation parameters of wavelet transform learnable to adapt to the dynamic input signals. Based on the stacked AD-TFM cells, we design a recurrent neural network with ATtention mechanism, named AD-TFM-AT model, to detect incipient fault with multi-resolution and multi-dimension analysis. In addition, we propose two data augmentation methods, namely phase switching and temporal sliding, to effectively enlarge the training datasets. Experimental results on two open datasets show that our proposed AD-TFM-AT model and data augmentation methods achieve state-of-the-art (SOTA) performance of incipient fault detection in power distribution system. We also disclose one used dataset logged at State Grid Corporation of China to facilitate future research.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
Reinforcement learning for traffic signal control in hybrid action space
Authors:
Haoqing Luo,
sheng jin
Abstract:
The prevailing reinforcement-learning-based traffic signal control methods are typically staging-optimizable or duration-optimizable, depending on the action spaces. In this paper, we propose a novel control architecture, TBO, which is based on hybrid proximal policy optimization. To the best of our knowledge, TBO is the first RL-based algorithm to implement synchronous optimization of the staging…
▽ More
The prevailing reinforcement-learning-based traffic signal control methods are typically staging-optimizable or duration-optimizable, depending on the action spaces. In this paper, we propose a novel control architecture, TBO, which is based on hybrid proximal policy optimization. To the best of our knowledge, TBO is the first RL-based algorithm to implement synchronous optimization of the staging and duration. Compared to discrete and continuous action spaces, hybrid action space is a merged search space, in which TBO better implements the trade-off between frequent switching and unsaturated release. Experiments are given to demonstrate that TBO reduces the queue length and delay by 13.78% and 14.08% on average, respectively, compared to the existing baselines. Furthermore, we calculate the Gini coefficients of the right-of-way to indicate TBO does not harm fairness while improving efficiency.
△ Less
Submitted 25 November, 2022; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Reconfigurable Intelligent Surface Aided Wireless Sensing for Scene Depth Estimation
Authors:
Abdelrahman Taha,
Hao Luo,
Ahmed Alkhateeb
Abstract:
Current scene depth estimation approaches mainly rely on optical sensing, which carries privacy concerns and suffers from estimation ambiguity for distant, shiny, and transparent surfaces/objects. Reconfigurable intelligent surfaces (RISs) provide a path for employing a massive number of antennas using low-cost and energy-efficient architectures. This has the potential for realizing RIS-aided wire…
▽ More
Current scene depth estimation approaches mainly rely on optical sensing, which carries privacy concerns and suffers from estimation ambiguity for distant, shiny, and transparent surfaces/objects. Reconfigurable intelligent surfaces (RISs) provide a path for employing a massive number of antennas using low-cost and energy-efficient architectures. This has the potential for realizing RIS-aided wireless sensing with high spatial resolution. In this paper, we propose to employ RIS-aided wireless sensing systems for scene depth estimation. We develop a comprehensive framework for building accurate depth maps using RIS-aided mmWave sensing systems. In this framework, we propose a new RIS interaction codebook capable of creating a sensing grid of reflected beams that meets the desirable characteristics of efficient scene depth map construction. Using the designed codebook, the received signals are processed to build high-resolution depth maps. Simulation results compare the proposed solution against RGB-based approaches and highlight the promise of adopting RIS-aided mmWave sensing in scene depth perception.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
3D Matting: A Benchmark Study on Soft Segmentation Method for Pulmonary Nodules Applied in Computed Tomography
Authors:
Lin Wang,
Xiufen Ye,
Donghao Zhang,
Wanji He,
Lie Ju,
Yi Luo,
Huan Luo,
Xin Wang,
Wei Feng,
Kaimin Song,
Xin Zhao,
Zongyuan Ge
Abstract:
Usually, lesions are not isolated but are associated with the surrounding tissues. For example, the growth of a tumour can depend on or infiltrate into the surrounding tissues. Due to the pathological nature of the lesions, it is challenging to distinguish their boundaries in medical imaging. However, these uncertain regions may contain diagnostic information. Therefore, the simple binarization of…
▽ More
Usually, lesions are not isolated but are associated with the surrounding tissues. For example, the growth of a tumour can depend on or infiltrate into the surrounding tissues. Due to the pathological nature of the lesions, it is challenging to distinguish their boundaries in medical imaging. However, these uncertain regions may contain diagnostic information. Therefore, the simple binarization of lesions by traditional binary segmentation can result in the loss of diagnostic information. In this work, we introduce the image matting into the 3D scenes and use the alpha matte, i.e., a soft mask, to describe lesions in a 3D medical image. The traditional soft mask acted as a training trick to compensate for the easily mislabelled or under-labelled ambiguous regions. In contrast, 3D matting uses soft segmentation to characterize the uncertain regions more finely, which means that it retains more structural information for subsequent diagnosis and treatment. The current study of image matting methods in 3D is limited. To address this issue, we conduct a comprehensive study of 3D matting, including both traditional and deep-learning-based methods. We adapt four state-of-the-art 2D image matting algorithms to 3D scenes and further customize the methods for CT images to calibrate the alpha matte with the radiodensity. Moreover, we propose the first end-to-end deep 3D matting network and implement a solid 3D medical image matting benchmark. Its efficient counterparts are also proposed to achieve a good performance-computation balance. Furthermore, there is no high-quality annotated dataset related to 3D matting, slowing down the development of data-driven deep-learning-based methods. To address this issue, we construct the first 3D medical matting dataset. The validity of the dataset was verified through clinicians' assessments and downstream experiments.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Joint Beamforming Design for RIS-Assisted Integrated Sensing and Communication Systems
Authors:
Honghao Luo,
Rang Liu,
Ming Li,
Yang Liu,
Qian Liu
Abstract:
Integrated sensing and communication (ISAC) has been envisioned as a promising technology to tackle the spectrum congestion problem for future networks. In this correspondence, we investigate to deploy a reconfigurable intelligent surface (RIS) in an ISAC system for achieving better performance. In particular, a multi-antenna base station (BS) simultaneously serves multiple single-antenna users wi…
▽ More
Integrated sensing and communication (ISAC) has been envisioned as a promising technology to tackle the spectrum congestion problem for future networks. In this correspondence, we investigate to deploy a reconfigurable intelligent surface (RIS) in an ISAC system for achieving better performance. In particular, a multi-antenna base station (BS) simultaneously serves multiple single-antenna users with the assistance of a RIS and detects potential targets. The active beamforming of the BS and the passive beamforming of the RIS are jointly optimized to maximize the achievable sum-rate of the communication users while satisfying the constraint of beampattern similarity for radar sensing, the restriction of the RIS, and the transmit power budget. An efficient alternating algorithm based on the fractional programming (FP), majorization-minimization (MM), and manifold optimization methods is developed to convert the resulting non-convex optimization problem into two solvable sub-problems and iteratively solve them. Simulation studies illustrate the advancement of deploying RIS in ISAC systems and the effectiveness of the proposed algorithm.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature Decoupling
Authors:
Jiamin Liang,
Xin Yang,
Yuhao Huang,
Kai Liu,
Xinrui Zhou,
Xindi Hu,
Zehui Lin,
Huanjia Luo,
Yuanji Zhang,
Yi Xiong,
Dong Ni
Abstract:
Ultrasound (US) is widely used for its advantages of real-time imaging, radiation-free and portability. In clinical practice, analysis and diagnosis often rely on US sequences rather than a single image to obtain dynamic anatomical information. This is challenging for novices to learn because practicing with adequate videos from patients is clinically unpractical. In this paper, we propose a novel…
▽ More
Ultrasound (US) is widely used for its advantages of real-time imaging, radiation-free and portability. In clinical practice, analysis and diagnosis often rely on US sequences rather than a single image to obtain dynamic anatomical information. This is challenging for novices to learn because practicing with adequate videos from patients is clinically unpractical. In this paper, we propose a novel framework to synthesize high-fidelity US videos. Specifically, the synthesis videos are generated by animating source content images based on the motion of given driving videos. Our highlights are three-fold. First, leveraging the advantages of self- and fully-supervised learning, our proposed system is trained in weakly-supervised manner for keypoint detection. These keypoints then provide vital information for handling complex high dynamic motions in US videos. Second, we decouple content and texture learning using the dual decoders to effectively reduce the model learning difficulty. Last, we adopt the adversarial training strategy with GAN losses for further improving the sharpness of the generated videos, narrowing the gap between real and synthesis videos. We validate our method on a large in-house pelvic dataset with high dynamic motion. Extensive evaluation metrics and user study prove the effectiveness of our proposed method.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Integrated Sensing and Communication with Reconfigurable Intelligent Surfaces: Opportunities, Applications, and Future Directions
Authors:
Rang Liu,
Ming Li,
Honghao Luo,
Qian Liu,
A. Lee Swindlehurst
Abstract:
Integrated sensing and communication (ISAC) is emerging as a key enabler to address the growing spectrum congestion problem and satisfy increasing demands for ubiquitous sensing and communication. By sharing various resources and information, ISAC achieves much higher spectral, energy, hardware, and economic efficiencies. Concurrently, reconfigurable intelligent surface (RIS) technology has been d…
▽ More
Integrated sensing and communication (ISAC) is emerging as a key enabler to address the growing spectrum congestion problem and satisfy increasing demands for ubiquitous sensing and communication. By sharing various resources and information, ISAC achieves much higher spectral, energy, hardware, and economic efficiencies. Concurrently, reconfigurable intelligent surface (RIS) technology has been deemed as a promising approach due to its capability of intelligently manipulating the wireless propagation environment in an energy and hardware efficient manner. In this article, we analyze the potential of deploying RIS to improve communication and sensing performance in ISAC systems. We first describe the fundamentals of RIS and its applications in traditional communication and sensing systems, then introduce the principles of ISAC and overview existing explorations on RIS-assisted ISAC, followed by one case study to verify the advantages of deploying RIS in ISAC systems. Finally, open challenges and research directions are discussed to stimulate this line of research and pave the way for practical applications.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Beam Squint Assisted User Localization in Near-Field Communications Systems
Authors:
Hongliang Luo,
Feifei Gao
Abstract:
The beam squint phenomenon in massive multi-input and multi-output wideband communications has been widely concerned recently, which generally deteriorates the beamforming performance. In this paper, we find that with the aid of the time-delay lines (TDs), the range and trajectory of the beam squint of a near-field communications system can be freely controlled, and hence it is possible to reverse…
▽ More
The beam squint phenomenon in massive multi-input and multi-output wideband communications has been widely concerned recently, which generally deteriorates the beamforming performance. In this paper, we find that with the aid of the time-delay lines (TDs), the range and trajectory of the beam squint of a near-field communications system can be freely controlled, and hence it is possible to reversely utilize the beam squint for user localization. We derive the trajectory equation for near-field beam squint points and design a way to control the trajectory of these beam squint points. With the proposed design, beamforming from different subcarriers would purposely point to different angles and different distances such that users from different positions would receive the maximum power at different subcarriers. Hence, one can simply find the different users' position from the beam squint effect. Simulation results demonstrate the effectiveness of the proposed scheme.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
A Novel Markov Model for Near-Term Railway Delay Prediction
Authors:
Jin Xu,
Weiqi Wang,
Zheming Gao,
Haochen Luo,
Qian Wu
Abstract:
Predicting the near-future delay with accuracy for trains is momentous for railway operations and passengers' traveling experience. This work aims to design prediction models for train delays based on Netherlands Railway data. We first develop a chi-square test to show that the delay evolution over stations follows a first-order Markov chain. We then propose a delay prediction model based on non-h…
▽ More
Predicting the near-future delay with accuracy for trains is momentous for railway operations and passengers' traveling experience. This work aims to design prediction models for train delays based on Netherlands Railway data. We first develop a chi-square test to show that the delay evolution over stations follows a first-order Markov chain. We then propose a delay prediction model based on non-homogeneous Markov chains. To deal with the sparsity of the transition matrices of the Markov chains, we propose a novel matrix recovery approach that relies on Gaussian kernel density estimation. Our numerical tests show that this recovery approach outperforms other heuristic approaches in prediction accuracy. The Markov chain model we propose also shows to be better than other widely-used time series models with respect to both interpretability and prediction accuracy. Moreover, our proposed model does not require a complicated training process, which is capable of handling large-scale forecasting problems.
△ Less
Submitted 21 May, 2022;
originally announced May 2022.
-
An Extreme Learning Machine-Based System Frequency Nadir Constraint Linearization Method
Authors:
Likai Liu,
Zechun Hu,
Nikhil Pathak,
Haocheng Luo
Abstract:
Large-scale integration of converter-based renewable energy sources (RESs) into the power system will lead to a higher risk of frequency nadir limit violation and even frequency instability after the large power disturbance. Therefore, it is essential to consider the frequency nadir constraint (FNC) in power system scheduling. Nevertheless, the FNC is highly nonlinear and non-convex. The state-of-…
▽ More
Large-scale integration of converter-based renewable energy sources (RESs) into the power system will lead to a higher risk of frequency nadir limit violation and even frequency instability after the large power disturbance. Therefore, it is essential to consider the frequency nadir constraint (FNC) in power system scheduling. Nevertheless, the FNC is highly nonlinear and non-convex. The state-of-the-art method to simplify the constraint is to construct a low-order frequency response model at first, and then linearize the frequency nadir equation. In this letter, an extreme learning machine (ELM)-based network is built to de-rive the linear formulation of FNC, where the two-step fitting process is integrated into one training process and more details about the physical model of the generator are considered to reduce the fitting error. Simulation results show the superiority of the proposed method on the fitting accuracy.
△ Less
Submitted 25 October, 2021; v1 submitted 12 August, 2021;
originally announced August 2021.
-
Integrated Communication and Navigation for Ultra-Dense LEO Satellite Networks: Vision, Challenges and Solutions
Authors:
Yu Wang,
Hejia Luo,
Ying Chen,
Jun Wang,
Rong Li,
Bin Wang
Abstract:
Next generation beyond 5G networks are expected to provide both Terabits per second data rate communication services and centimeter-level accuracy localization services in an efficient, seamless and cost-effective manner. However, most of the current communication and localization systems are separately designed, leading to an under-utilization of radio resources and network performance degradatio…
▽ More
Next generation beyond 5G networks are expected to provide both Terabits per second data rate communication services and centimeter-level accuracy localization services in an efficient, seamless and cost-effective manner. However, most of the current communication and localization systems are separately designed, leading to an under-utilization of radio resources and network performance degradation. In this paper, we propose an integrated communication and navigation (ICAN) framework to fully unleash the potential of ultra-dense LEO satellite networks for optimal provisioning of differentiated services. The specific benefits, feasibility analysis and challenges for ICAN enabled satellite system are explicitly discussed. In particular, a novel beam hopping based ICAN satellite system solution is devised to adaptively tune the network beam layout for dual functional communication and positioning purposes. Furthermore, a thorough experimental platform is built following the Third Generation Partnership Project (3GPP) defined non-terrestrial network simulation parameters to validate the performance gain of the ICAN satellite system
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
NTIRE 2021 Challenge on Perceptual Image Quality Assessment
Authors:
Jinjin Gu,
Haoming Cai,
Chao Dong,
Jimmy S. Ren,
Yu Qiao,
Shuhang Gu,
Radu Timofte,
Manri Cheon,
Sungjun Yoon,
Byungyeon Kang,
Junwoo Lee,
Qing Zhang,
Haiyang Guo,
Yi Bin,
Yuqing Hou,
Hengliang Luo,
Jingyu Guo,
Zirui Wang,
Hai Wang,
Wenming Yang,
Qingyan Bai,
Shuwei Shi,
Weihao Xia,
Mingdeng Cao,
Jiahao Wang
, et al. (25 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o…
▽ More
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance.
△ Less
Submitted 28 June, 2021; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR
Authors:
Fan Yu,
Haoneng Luo,
Pengcheng Guo,
Yuhao Liang,
Zhuoyuan Yao,
Lei Xie,
Yingying Gao,
Leijing Hou,
Shilei Zhang
Abstract:
Continuous integrate-and-fire (CIF) based models, which use a soft and monotonic alignment mechanism, have been well applied in non-autoregressive (NAR) speech recognition with competitive performance compared with other NAR methods. However, such an alignment learning strategy may suffer from an erroneous acoustic boundary estimation, severely hindering the convergence speed as well as the system…
▽ More
Continuous integrate-and-fire (CIF) based models, which use a soft and monotonic alignment mechanism, have been well applied in non-autoregressive (NAR) speech recognition with competitive performance compared with other NAR methods. However, such an alignment learning strategy may suffer from an erroneous acoustic boundary estimation, severely hindering the convergence speed as well as the system performance. In this paper, we propose a boundary and context aware training approach for CIF based NAR models. Firstly, the connectionist temporal classification (CTC) spike information is utilized to guide the learning of acoustic boundaries in the CIF. Besides, an additional contextual decoder is introduced behind the CIF decoder, aiming to capture the linguistic dependencies within a sentence. Finally, we adopt a recently proposed Conformer architecture to improve the capacity of acoustic modeling. Experiments on the open-source Mandarin AISHELL-1 corpus show that the proposed method achieves a comparable character error rates (CERs) of 4.9% with only 1/24 latency compared with a state-of-the-art autoregressive (AR) Conformer model. Futhermore, when evaluating on an internal 7500 hours Mandarin corpus, our model still outperforms other NAR methods and even reaches the AR Conformer model on a challenging real-world noisy test set.
△ Less
Submitted 26 September, 2021; v1 submitted 10 April, 2021;
originally announced April 2021.
-
Agent with Warm Start and Adaptive Dynamic Termination for Plane Localization in 3D Ultrasound
Authors:
Xin Yang,
Haoran Dou,
Ruobing Huang,
Wufeng Xue,
Yuhao Huang,
Jikuan Qian,
Yuanji Zhang,
Huanjia Luo,
Huizhi Guo,
Tianfu Wang,
Yi Xiong,
Dong Ni
Abstract:
Accurate standard plane (SP) localization is the fundamental step for prenatal ultrasound (US) diagnosis. Typically, dozens of US SPs are collected to determine the clinical diagnosis. 2D US has to perform scanning for each SP, which is time-consuming and operator-dependent. While 3D US containing multiple SPs in one shot has the inherent advantages of less user-dependency and more efficiency. Aut…
▽ More
Accurate standard plane (SP) localization is the fundamental step for prenatal ultrasound (US) diagnosis. Typically, dozens of US SPs are collected to determine the clinical diagnosis. 2D US has to perform scanning for each SP, which is time-consuming and operator-dependent. While 3D US containing multiple SPs in one shot has the inherent advantages of less user-dependency and more efficiency. Automatically locating SP in 3D US is very challenging due to the huge search space and large fetal posture variations. Our previous study proposed a deep reinforcement learning (RL) framework with an alignment module and active termination to localize SPs in 3D US automatically. However, termination of agent search in RL is important and affects the practical deployment. In this study, we enhance our previous RL framework with a newly designed adaptive dynamic termination to enable an early stop for the agent searching, saving at most 67% inference time, thus boosting the accuracy and efficiency of the RL framework at the same time. Besides, we validate the effectiveness and generalizability of our algorithm extensively on our in-house multi-organ datasets containing 433 fetal brain volumes, 519 fetal abdomen volumes, and 683 uterus volumes. Our approach achieves localization error of 2.52mm/10.26 degrees, 2.48mm/10.39 degrees, 2.02mm/10.48 degrees, 2.00mm/14.57 degrees, 2.61mm/9.71 degrees, 3.09mm/9.58 degrees, 1.49mm/7.54 degrees for the transcerebellar, transventricular, transthalamic planes in fetal brain, abdominal plane in fetal abdomen, and mid-sagittal, transverse and coronal planes in uterus, respectively. Experimental results show that our method is general and has the potential to improve the efficiency and standardization of US scanning.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Super-resolving Compressed Images via Parallel and Series Integration of Artifact Reduction and Resolution Enhancement
Authors:
Hongming Luo,
Fei Zhou,
Guangsen Liao,
Guoping Qiu
Abstract:
In real-world applications, such as sharing photos on social media platforms, images are always not only sub-sampled but also heavily compressed thus often containing various artefacts. Simple methods for enhancing the resolution of such images will exacerbate the artefacts, rendering them visually objectionable. In spite of its high practical values, super-resolving compressed images is not well…
▽ More
In real-world applications, such as sharing photos on social media platforms, images are always not only sub-sampled but also heavily compressed thus often containing various artefacts. Simple methods for enhancing the resolution of such images will exacerbate the artefacts, rendering them visually objectionable. In spite of its high practical values, super-resolving compressed images is not well studied in the literature. In this paper, we propose a novel compressed image super resolution (CISR) framework based on parallel and series integration of artefacts removal and resolution enhancement. Based on a mathematical inference model for estimating a clean low-resolution (LR) image and a clean high-resolution (HR) image from a down-sampled and compressed observation, we have designed a CISR architecture consisting of two deep neural network modules: the artefacts removal module (ARM) and the resolution enhancement module (REM). The ARM and the REM work in parallel with both taking the compressed LR image as their inputs, at the same time they also work in series with the REM taking the output of the ARM as one of its inputs and the ARM taking the output of the REM as its other input. A technique called unfolding is introduced to recursively suppress the compression artefacts and restore the image resolution. A unique feature of our CISR system is that it exploits the parallel and series connections between the ARM and the REM, and recursive optimization to reduce the model's dependency on specific types of degradation thus making it possible to train a single model to super-resolve images compressed by different methods to different qualities. Codes and datasets are available at https://github.com/luohongming/CISR_PSI.git
△ Less
Submitted 21 November, 2022; v1 submitted 2 March, 2021;
originally announced March 2021.
-
Bridge the Vision Gap from Field to Command: A Deep Learning Network Enhancing Illumination and Details
Authors:
Zhuqing Jiang,
Chang Liu,
Ya'nan Wang,
Kai Li,
Aidong Men,
Haiying Wang,
Haiyong Luo
Abstract:
With the goal of tuning up the brightness, low-light image enhancement enjoys numerous applications, such as surveillance, remote sensing and computational photography. Images captured under low-light conditions often suffer from poor visibility and blur. Solely brightening the dark regions will inevitably amplify the blur, thus may lead to detail loss. In this paper, we propose a simple yet effec…
▽ More
With the goal of tuning up the brightness, low-light image enhancement enjoys numerous applications, such as surveillance, remote sensing and computational photography. Images captured under low-light conditions often suffer from poor visibility and blur. Solely brightening the dark regions will inevitably amplify the blur, thus may lead to detail loss. In this paper, we propose a simple yet effective two-stream framework named NEID to tune up the brightness and enhance the details simultaneously without introducing many computational costs. Precisely, the proposed method consists of three parts: Light Enhancement (LE), Detail Refinement (DR) and Feature Fusing (FF) module, which can aggregate composite features oriented to multiple tasks based on channel attention mechanism. Extensive experiments conducted on several benchmark datasets demonstrate the efficacy of our method and its superiority over state-of-the-art methods.
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
VHS to HDTV Video Translation using Multi-task Adversarial Learning
Authors:
Hongming Luo,
Guangsen Liao,
Xianxu Hou,
Bozhi Liu,
Fei Zhou,
Guoping Qiu
Abstract:
There are large amount of valuable video archives in Video Home System (VHS) format. However, due to the analog nature, their quality is often poor. Compared to High-definition television (HDTV), VHS video not only has a dull color appearance but also has a lower resolution and often appears blurry. In this paper, we focus on the problem of translating VHS video to HDTV video and have developed a…
▽ More
There are large amount of valuable video archives in Video Home System (VHS) format. However, due to the analog nature, their quality is often poor. Compared to High-definition television (HDTV), VHS video not only has a dull color appearance but also has a lower resolution and often appears blurry. In this paper, we focus on the problem of translating VHS video to HDTV video and have developed a solution based on a novel unsupervised multi-task adversarial learning model. Inspired by the success of generative adversarial network (GAN) and CycleGAN, we employ cycle consistency loss, adversarial loss and perceptual loss together to learn a translation model. An important innovation of our work is the incorporation of super-resolution model and color transfer model that can solve unsupervised multi-task problem. To our knowledge, this is the first work that dedicated to the study of the relation between VHS and HDTV and the first computational solution to translate VHS to HDTV. We present experimental results to demonstrate the effectiveness of our solution qualitatively and quantitatively.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
Spectral-change enhancement with prior SNR for the hearing impaired
Authors:
Xiang Li,
Xin Tian,
Henry Luo,
Jinyu Qian,
Xihong Wu,
Dingsheng Luo,
Jing Chen
Abstract:
A previous signal processing algorithm that aimed to enhance spectral changes (SCE) over time showed benefit for hearing-impaired (HI) listeners to recognize speech in background noise. In this work, the previous SCE was manipulated to perform on target-dominant segments, rather than treating all frames equally. Instantaneous signal-to-noise ratios (SNRs) were calculated to determine whether the s…
▽ More
A previous signal processing algorithm that aimed to enhance spectral changes (SCE) over time showed benefit for hearing-impaired (HI) listeners to recognize speech in background noise. In this work, the previous SCE was manipulated to perform on target-dominant segments, rather than treating all frames equally. Instantaneous signal-to-noise ratios (SNRs) were calculated to determine whether the segments should be processed. Initially, the ideal SNR calculated by the knowledge of premixed signals was introduced to the previous SCE algorithm (SCE-iSNR). Speech intelligibility (SI) and clarity preference were measured for 12 HI listeners in steady speech-spectrum noise (SSN) and six-talk speech (STS) maskers, respectively. The results showed the SCE-iSNR algorithm improved SI significantly for both maskers at high signal-to-masker ratios (SMRs) and for STS masker at low SMRs, while processing effect on speech quality was small. Secondly, the estimated SNR obtained from real mixtures was used, resulting in another SCE-eSNR. SI and subjective rating on naturalness and speech quality were tested for 7 HI subjects. The SCE-eSNR algorithm showed improved SI for SSN masker at high SMRs and for STS masker at low SMRs, as well as better naturalness and speech quality for STS masker. The limitations of applying the algorithms are discussed.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.