Search | arXiv e-print repository

Hi-SAM: A high-scalable authentication model for satellite-ground Zero-Trust system using mean field game

Authors: Xuesong Wu, Tianshuai Zheng, Runfang Wu, Jie Ren, Junyan Guo, Ye Du

Abstract: As more and more Internet of Thing (IoT) devices are connected to satellite networks, the Zero-Trust Architecture brings dynamic security to the satellite-ground system, while frequent authentication creates challenges for system availability. To make the system's accommodate more IoT devices, this paper proposes a high-scalable authentication model (Hi-SAM). Hi-SAM introduces the Proof-of-Work id… ▽ More As more and more Internet of Thing (IoT) devices are connected to satellite networks, the Zero-Trust Architecture brings dynamic security to the satellite-ground system, while frequent authentication creates challenges for system availability. To make the system's accommodate more IoT devices, this paper proposes a high-scalable authentication model (Hi-SAM). Hi-SAM introduces the Proof-of-Work idea to authentication, which allows device to obtain the network resource based on frequency. To optimize the frequency, mean field game is used for competition among devices, which can reduce the decision space of large-scale population games. And a dynamic time-range message authentication code is designed for security. From the test at large population scales, Hi-SAM is superior in the optimization of authentication workload and the anomaly detection efficiency. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.04677 [pdf, other]

Open-Source Software Architecture for Multi-Robot Wire Arc Additive Manufacturing (WAAM)

Authors: Honglu He, Chen-lung Lu, Jinhan Ren, Joni Dhar, Glenn Saunders, John Wason, Johnson Samuel, Agung Julius, John T. Wen

Abstract: Wire Arc Additive Manufacturing (WAAM) is a metal 3D printing technology that deposits molten metal wire on a substrate to form desired geometries. Articulated robot arms are commonly used in WAAM to produce complex geometric shapes. However, they mostly rely on proprietary robot and weld control software that limits process tuning and customization, incorporation of third-party sensors, implement… ▽ More Wire Arc Additive Manufacturing (WAAM) is a metal 3D printing technology that deposits molten metal wire on a substrate to form desired geometries. Articulated robot arms are commonly used in WAAM to produce complex geometric shapes. However, they mostly rely on proprietary robot and weld control software that limits process tuning and customization, incorporation of third-party sensors, implementation on robots and weld controllers from multiple vendors, and customizable user programming. This paper presents a general open-source software architecture for WAAM that addresses these limitations. The foundation of this architecture is Robot Raconteur, an open-source control and communication framework that serves as the middleware for integrating robots and sensors from different vendors. Based on this architecture, we developed an end-to-end robotic WAAM implementation that takes a CAD file to a printed WAAM part and evaluates the accuracy of the result. The major components in the architecture include part slicing, robot motion planning, part metrology, in-process sensing, and process tuning. The current implementation is based on Motoman robots and Fronius weld controller, but the approach is applicable to other industrial robots and weld controllers. The capability of the WAAM tested is demonstrated through the printing of parts of various geometries and acquisition of in-process sensor data for motion adjustment. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.02047 [pdf, other]

Latency-Aware Resource Allocation for Mobile Edge Generation and Computing via Deep Reinforcement Learning

Authors: Yinyu Wu, Xuhui Zhang, Jinke Ren, Huijun Xing, Yanyan Shen, Shuguang Cui

Abstract: Recently, the integration of mobile edge computing (MEC) and generative artificial intelligence (GAI) technology has given rise to a new area called mobile edge generation and computing (MEGC), which offers mobile users heterogeneous services such as task computing and content generation. In this letter, we investigate the joint communication, computation, and the AIGC resource allocation problem… ▽ More Recently, the integration of mobile edge computing (MEC) and generative artificial intelligence (GAI) technology has given rise to a new area called mobile edge generation and computing (MEGC), which offers mobile users heterogeneous services such as task computing and content generation. In this letter, we investigate the joint communication, computation, and the AIGC resource allocation problem in an MEGC system. A latency minimization problem is first formulated to enhance the quality of service for mobile users. Due to the strong coupling of the optimization variables, we propose a new deep reinforcement learning-based algorithm to solve it efficiently. Numerical results demonstrate that the proposed algorithm can achieve lower latency than two baseline algorithms. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: 5 pages, 5 figures, submitted to IEEE

arXiv:2406.15997 [pdf, ps, other]

State-Compensation-Linearization-Based Stability Margin Analysis for a Class of Nonlinear Systems: A Data-Driven Method

Authors: Jinrui Ren, Quan Quan

Abstract: The classical stability margin analysis based on the linearized model is widely used in practice even in nonlinear systems. Although linear analysis techniques are relatively standard and have simple implementation structures, they are prone to misbehavior and failure when the system is performing an off-nominal operation. To avoid the drawbacks and exploit the advantages of linear analysis method… ▽ More The classical stability margin analysis based on the linearized model is widely used in practice even in nonlinear systems. Although linear analysis techniques are relatively standard and have simple implementation structures, they are prone to misbehavior and failure when the system is performing an off-nominal operation. To avoid the drawbacks and exploit the advantages of linear analysis methods and frequency-domain stability margin analysis while tackling system nonlinearity, a state-compensation-linearization-based stability margin analysis method is studied in the paper. Based on the state-compensation-linearization-based stabilizing control, the definition and measurement of the stability margin are given. The l2 gain margin and l2 time-delay margin for the closed-loop nonlinear system with state-compensation-linearization-based stabilizing control are defined and derived approximatively by the small-gain theorem in theory. The stability margin measurement can be carried out by the frequency-sweep method in practice. The proposed method is a data-driven method for obtaining the stability margin of nonlinear systems, which is practical and can be applied to practical systems directly. Finally, three numerical examples are given to illustrate the effectiveness of the proposed method. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.07374 [pdf, other]

Movable-Antenna Array Empowered ISAC Systems for Low-Altitude Economy

Authors: Ziming Kuang, Wenchao Liu, Chunjie Wang, Zhenzhen Jin, Jinke Ren, Xuhui Zhang, Yanyan Shen

Abstract: This paper investigates a movable-antenna (MA) array empowered integrated sensing and communications (ISAC) over low-altitude platform (LAP) system to support low-altitude economy (LAE) applications. In the considered system, an unmanned aerial vehicle (UAV) is dispatched to hover in the air, working as the UAV-enabled LAP (ULAP) to provide information transmission and sensing simultaneously for L… ▽ More This paper investigates a movable-antenna (MA) array empowered integrated sensing and communications (ISAC) over low-altitude platform (LAP) system to support low-altitude economy (LAE) applications. In the considered system, an unmanned aerial vehicle (UAV) is dispatched to hover in the air, working as the UAV-enabled LAP (ULAP) to provide information transmission and sensing simultaneously for LAE applications. To improve the throughput capacity and meet the requirement of the sensing beampattern threshold, we formulate a data rate maximization problem by jointly optimizing the transmit information and sensing beamforming, and the antenna positions of the MA array. Since the data rate maximization problem is non-convex with highly coupled variables, we propose an efficient alternation optimization based algorithm, which iteratively optimizes parts of the variables while fixing the others. Numerical results show the superiority of the proposed MA array-based scheme in terms of the achievable data rate and beamforming gain compared with two benchmark schemes. △ Less

Submitted 21 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: Will be presented in IEEE/CIC International Conference on Communications in China Workshops 2024 in Hangzhou

arXiv:2406.07255 [pdf, other]

Towards Realistic Data Generation for Real-World Super-Resolution

Authors: Long Peng, Wenbo Li, Renjing Pei, Jingjing Ren, Xueyang Fu, Yang Wang, Yang Cao, Zheng-Jun Zha

Abstract: Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producin… ▽ More Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producing large-scale, realistic, and diverse data simultaneously. In this paper, we introduce a novel Realistic Decoupled Data Generator (RealDGen), an unsupervised learning data generation framework designed for real-world super-resolution. We meticulously develop content and degradation extraction strategies, which are integrated into a novel content-degradation decoupled diffusion model to create realistic low-resolution images from unpaired real LR and HR images. Extensive experiments demonstrate that RealDGen excels in generating large-scale, high-quality paired data that mirrors real-world degradations, significantly advancing the performance of popular SR models on various real-world benchmarks. △ Less

Submitted 11 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05700 [pdf, other]

HDMba: Hyperspectral Remote Sensing Imagery Dehazing with State Space Model

Authors: Hang Fu, Genyun Sun, Yinhe Li, Jinchang Ren, Aizhu Zhang, Cheng Jing, Pedram Ghamisi

Abstract: Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Ins… ▽ More Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Inspired by the ability of Mamba to model long-range dependencies with linear complexity, we explore its potential for HSI dehazing and propose the first HSI Dehazing Mamba (HDMba) network. Specifically, we design a novel window selective scan module (WSSM) that captures local dependencies within windows and global correlations between windows by partitioning them. This approach improves the ability of conventional Mamba in local feature extraction. By modeling the local and global spectral-spatial information flow, we achieve a comprehensive analysis of hazy regions. The DehazeMamba layer (DML), constructed by WSSM, and residual DehazeMamba (RDM) blocks, composed of DMLs, are the core components of the HDMba framework. These components effectively characterize the complex distribution of haze in HSIs, aiding in scene reconstruction and dehazing. Experimental results on the Gaofen-5 HSI dataset demonstrate that HDMba outperforms other state-of-the-art methods in dehazing performance. The code will be available at https://github.com/RsAI-lab/HDMba. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04324 [pdf, other]

SF-V: Single Forward Video Generation Model

Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project Page: https://snap-research.github.io/SF-V

arXiv:2406.00285 [pdf, ps, other]

State Compensation Linearization and Control

Authors: Quan Quan, Jinrui Ren

Abstract: The linearization method builds a bridge from mature methods for linear systems to nonlinear systems and has been widely used in various areas. There are currently two main linearization methods: Jacobian linearization and feedback linearization. However, the Jacobian linearization method has approximate and local properties, and the feedback linearization method has a singularity problem and lose… ▽ More The linearization method builds a bridge from mature methods for linear systems to nonlinear systems and has been widely used in various areas. There are currently two main linearization methods: Jacobian linearization and feedback linearization. However, the Jacobian linearization method has approximate and local properties, and the feedback linearization method has a singularity problem and loses the physical meaning of the obtained states. Thus, as a kind of complementation, a new linearization method named state compensation linearization is proposed in the paper. Their differences, advantages, and disadvantages are discussed in detail. Based on the state compensation linearization, a state-compensation-linearization-based control framework is proposed for a class of nonlinear systems. Under the new framework, the original problem can be simplified. The framework also allows different control methods, especially those only applicable to linear systems, to be incorporated. Three illustrative examples are also given to show the process and effectiveness of the proposed linearization method and control framework. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.20746 [pdf, ps, other]

doi 10.1109/LWC.2024.3451246

UAV-Enabled Wireless Networks with Movable-Antenna Array: Flexible Beamforming and Trajectory Design

Authors: Wenchao Liu, Xuhui Zhang, Huijun Xing, Jinke Ren, Yanyan Shen, Shuguang Cui

Abstract: Recently, movable antenna (MA) array becomes a promising technology for improving the communication quality in wireless communication systems. In this letter, an unmanned aerial vehicle (UAV) enabled multi-user multi-input-single-output system enhanced by the MA array is investigated. To enhance the throughput capacity, we aim to maximize the achievable data rate by jointly optimizing the transmit… ▽ More Recently, movable antenna (MA) array becomes a promising technology for improving the communication quality in wireless communication systems. In this letter, an unmanned aerial vehicle (UAV) enabled multi-user multi-input-single-output system enhanced by the MA array is investigated. To enhance the throughput capacity, we aim to maximize the achievable data rate by jointly optimizing the transmit beamforming, the UAV trajectory, and the positions of the MA array antennas. The formulated data rate maximization problem is a highly coupled non-convex problem, for which an alternating optimization based algorithm is proposed to get a sub-optimal solution. Numerical results have demonstrated the performance gain of the proposed method compared with conventional method with fixed-position antenna array. △ Less

Submitted 26 August, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Comments: This paper has been accepted for publication by IEEE Wireless Communications Letters. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2405.15705 [pdf, other]

Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates

Authors: Jinbo Peng, Zhe Chen, Zheng Lin, Haoxuan Yuan, Zihan Fang, Lingzhong Bao, Zihang Song, Ying Li, Jing Ren, Yue Gao

Abstract: Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist… ▽ More Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist sampling) can significantly improve their cost- and energy-efficiency. However, to achieve a multiband signals sniffer is really a challenge. To this end, we propose Sums, a system that can sniff and analyze multiband signals in a blind manner. Our Sums takes advantage of hardware and algorithm co-design, multi-coset sub-Nyquist sampling hardware, and a multi-task deep learning framework. The hardware component breaks the Nyquist rule to sample GHz bandwidth, but only pays for a 50 MSPS sampling rate. Our multi-task learning framework directly tackles the sampling data to perform spectrum sensing, physical layer protocol recognition, and demodulation for deep inspection from multiband signals. Extensive experiments demonstrate that Sums achieves higher accuracy than the state-of-theart baselines in spectrum sensing, modulation classification, and demodulation. As a result, our Sums can help researchers and end-users to diagnose or troubleshoot their problems of wireless infrastructures deployments in practice. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 12 pages, 9 figures

arXiv:2405.04867 [pdf, other]

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2405.02191 [pdf]

Non-Destructive Peat Analysis using Hyperspectral Imaging and Machine Learning

Authors: Yijun Yan, Jinchang Ren, Barry Harrison, Oliver Lewis, Yinhe Li, Ping Ma

Abstract: Peat, a crucial component in whisky production, imparts distinctive and irreplaceable flavours to the final product. However, the extraction of peat disrupts ancient ecosystems and releases significant amounts of carbon, contributing to climate change. This paper aims to address this issue by conducting a feasibility study on enhancing peat use efficiency in whisky manufacturing through non-destru… ▽ More Peat, a crucial component in whisky production, imparts distinctive and irreplaceable flavours to the final product. However, the extraction of peat disrupts ancient ecosystems and releases significant amounts of carbon, contributing to climate change. This paper aims to address this issue by conducting a feasibility study on enhancing peat use efficiency in whisky manufacturing through non-destructive analysis using hyperspectral imaging. Results show that shot-wave infrared (SWIR) data is more effective for analyzing peat samples and predicting total phenol levels, with accuracies up to 99.81%. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 4 pages,4 figures

arXiv:2405.00736 [pdf, other]

Joint Signal Detection and Automatic Modulation Classification via Deep Learning

Authors: Huijun Xing, Xuhui Zhang, Shuo Chang, Jinke Ren, Zixun Zhang, Jie Xu, Shuguang Cui

Abstract: Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different ca… ▽ More Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different carrier frequencies. We first generate a coexisting RADIOML dataset (CRML23) to facilitate the joint design. Different from the publicly available AMC dataset ignoring the signal detection step and containing only one signal, our synthetic dataset covers the more realistic multiple-signal coexisting scenario. Then, we present a joint framework for detection and classification (JDM) for such a multiple-signal coexisting environment, which consists of two modules for signal detection and AMC, respectively. In particular, these two modules are interconnected using a designated data structure called "proposal". Finally, we conduct extensive simulations over the newly developed dataset, which demonstrate the effectiveness of our designs. Our code and dataset are now available as open-source (https://github.com/Singingkettle/ChangShuoRadioData). △ Less

Submitted 29 April, 2024; originally announced May 2024.

arXiv:2404.17484 [pdf, other]

Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model

Authors: Zhenghong Li, Jiaxiang Ren, Wensheng Cheng, Congwu Du, Yingtian Pan, Haibin Ling

Abstract: Optical Doppler Tomography (ODT) is a blood flow imaging technique popularly used in bioengineering applications. The fundamental unit of ODT is the 1D frequency response along the A-line (depth), named raw A-scan. A 2D ODT image (B-scan) is obtained by first sensing raw A-scans along the B-line (width), and then constructing the B-scan from these raw A-scans via magnitude-phase analysis and post-… ▽ More Optical Doppler Tomography (ODT) is a blood flow imaging technique popularly used in bioengineering applications. The fundamental unit of ODT is the 1D frequency response along the A-line (depth), named raw A-scan. A 2D ODT image (B-scan) is obtained by first sensing raw A-scans along the B-line (width), and then constructing the B-scan from these raw A-scans via magnitude-phase analysis and post-processing. To obtain a high-resolution B-scan with a precise flow map, densely sampled A-scans are required in current methods, causing both computational and storage burdens. To address this issue, in this paper we propose a novel sparse reconstruction framework with four main sequential steps: 1) early magnitude-phase fusion that encourages rich interaction of the complementary information in magnitude and phase, 2) State Space Model (SSM)-based representation learning, inspired by recent successes in Mamba and VMamba, to naturally capture both the intra-A-scan sequential information and between-A-scan interactions, 3) an Inception-based Feedforward Network module (IncFFN) to further boost the SSM-module, and 4) a B-line Pixel Shuffle (BPS) layer to effectively reconstruct the final results. In the experiments on real-world animal data, our method shows clear effectiveness in reconstruction accuracy. As the first application of SSM for image reconstruction tasks, we expect our work to inspire related explorations in not only efficient ODT imaging techniques but also generic image enhancement. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 19 pages, 5 figures

arXiv:2404.15256 [pdf, other]

TOP-Nav: Legged Navigation Integrating Terrain, Obstacle and Proprioception Estimation

Authors: Junli Ren, Yikai Liu, Yingru Dai, Junfeng Long, Guijin Wang

Abstract: Legged navigation is typically examined within open-world, off-road, and challenging environments. In these scenarios, estimating external disturbances requires a complex synthesis of multi-modal information. This underlines a major limitation in existing works that primarily focus on avoiding obstacles. In this work, we propose TOP-Nav, a novel legged navigation framework that integrates a compre… ▽ More Legged navigation is typically examined within open-world, off-road, and challenging environments. In these scenarios, estimating external disturbances requires a complex synthesis of multi-modal information. This underlines a major limitation in existing works that primarily focus on avoiding obstacles. In this work, we propose TOP-Nav, a novel legged navigation framework that integrates a comprehensive path planner with Terrain awareness, Obstacle avoidance and close-loop Proprioception. TOP-Nav underscores the synergies between vision and proprioception in both path and motion planning. Within the path planner, we present and integrate a terrain estimator that enables the robot to select waypoints on terrains with higher traversability while effectively avoiding obstacles. In the motion planning level, we not only implement a locomotion controller to track the navigation commands, but also construct a proprioception advisor to provide motion evaluations for the path planner. Based on the close-loop motion feedback, we make online corrections for the vision-based terrain and obstacle estimations. Consequently, TOP-Nav achieves open-world navigation that the robot can handle terrains or disturbances beyond the distribution of prior knowledge and overcomes constraints imposed by visual conditions. Building upon extensive experiments conducted in both simulation and real-world environments, TOP-Nav demonstrates superior performance in open-world navigation compared to existing methods. △ Less

Submitted 12 July, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

arXiv:2403.02601 [pdf, other]

Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

Authors: Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhu

Abstract: For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (L… ▽ More For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (LR) reconstruction network to extract degradation embeddings from LR images, merging them with super-resolved outputs for LR reconstruction. Leveraging unseen LR images for self-supervised learning guides the model to adapt its modeling space to the target domain, facilitating fine-tuning of SR models without requiring paired high-resolution (HR) images. The integration of Discrete Wavelet Transform (DWT) further refines the focus on high-frequency details. Extensive evaluations show that our method significantly improves the generalization and detail restoration capabilities of SR models on unseen real-world datasets, outperforming existing methods. Our training regime is universally compatible, requiring no network architecture modifications, making it a practical solution for real-world SR applications. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2402.12658 [pdf, other]

doi 10.1109/OCEANSLimerick52467.2023.10244447

Guiding the underwater acoustic target recognition with interpretable contrastive learning

Authors: Yuan Xie, Jiawei Ren, Ji Xu

Abstract: Recognizing underwater targets from acoustic signals is a challenging task owing to the intricate ocean environments and variable underwater channels. While deep learning-based systems have become the mainstream approach for underwater acoustic target recognition, they have faced criticism for their lack of interpretability and weak generalization performance in practical applications. In this wor… ▽ More Recognizing underwater targets from acoustic signals is a challenging task owing to the intricate ocean environments and variable underwater channels. While deep learning-based systems have become the mainstream approach for underwater acoustic target recognition, they have faced criticism for their lack of interpretability and weak generalization performance in practical applications. In this work, we apply the class activation mapping (CAM) to generate visual explanations for the predictions of a spectrogram-based recognition system. CAM can help to understand the behavior of recognition models by highlighting the regions of the input features that contribute the most to the prediction. Our explorations reveal that recognition models tend to focus on the low-frequency line spectrum and high-frequency periodic modulation information of underwater signals. Based on the observation, we propose an interpretable contrastive learning (ICL) strategy that employs two encoders to learn from acoustic features with different emphases (line spectrum and modulation information). By imposing constraints between encoders, the proposed strategy can enhance the generalization performance of the recognition system. Our experiments demonstrate that the proposed contrastive learning approach can improve the recognition accuracy and bring significant improvements across various underwater databases. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Journal ref: OCEANS 2023-Limerick. IEEE, 2023: 1-6

arXiv:2402.11919 [pdf, other]

doi 10.1016/j.eswa.2024.123431

Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts

Authors: Yuan Xie, Jiawei Ren, Ji Xu

Abstract: Underwater acoustic target recognition is a difficult task owing to the intricate nature of underwater acoustic signals. The complex underwater environments, unpredictable transmission channels, and dynamic motion states greatly impact the real-world underwater acoustic signals, and may even obscure the intrinsic characteristics related to targets. Consequently, the data distribution of underwater… ▽ More Underwater acoustic target recognition is a difficult task owing to the intricate nature of underwater acoustic signals. The complex underwater environments, unpredictable transmission channels, and dynamic motion states greatly impact the real-world underwater acoustic signals, and may even obscure the intrinsic characteristics related to targets. Consequently, the data distribution of underwater acoustic signals exhibits high intra-class diversity, thereby compromising the accuracy and robustness of recognition systems.To address these issues, this work proposes a convolution-based mixture of experts (CMoE) that recognizes underwater targets in a fine-grained manner. The proposed technique introduces multiple expert layers as independent learners, along with a routing layer that determines the assignment of experts according to the characteristics of inputs. This design allows the model to utilize independent parameter spaces, facilitating the learning of complex underwater signals with high intra-class diversity. Furthermore, this work optimizes the CMoE structure by balancing regularization and an optional residual module. To validate the efficacy of our proposed techniques, we conducted detailed experiments and visualization analyses on three underwater acoustic databases across several acoustic features. The experimental results demonstrate that our CMoE consistently achieves significant performance improvements, delivering superior recognition accuracy when compared to existing advanced methods. △ Less

Submitted 30 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Journal ref: Expert Systems with Applications (2024): 123431

arXiv:2401.09648 [pdf]

Staggered Comb Reference Signal Design for Integrated Communication and Sensing

Authors: Rui Zhang, Shawn Tsai, Tzu-Han Chou, Jiaying Ren

Abstract: Ambiguity performance is a critical criterion in radar sensor design, which indicates the ambiguities arising from multiple target estimation and detection. We considered a requirement-driven selection of OFDM reference signal (RS) patterns based on ambiguity performances for bi-static sensing in integrated communication and sensing with minimal modifications of current RSs. An RS pattern with a s… ▽ More Ambiguity performance is a critical criterion in radar sensor design, which indicates the ambiguities arising from multiple target estimation and detection. We considered a requirement-driven selection of OFDM reference signal (RS) patterns based on ambiguity performances for bi-static sensing in integrated communication and sensing with minimal modifications of current RSs. An RS pattern with a staggering offset of a linear slope that is relatively prime to the RS comb size is suggested for standard-resolution sensing algorithms to obtain the best ambiguity performances. Moreover, an extended guard interval design is proposed to increase the maximum time delay, that is inter-symbol interference (ISI) free using post-FFT sensing algorithms. The proposed techniques are promising to extend the distance and speed without ambiguities and ISI for sensing. △ Less

Submitted 25 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: accepted by IEEE International Symposium on Personal, Indoor and Mobile Radio Communications. arXiv admin note: substantial text overlap with arXiv:2401.09643

arXiv:2401.09643 [pdf]

OFDM Reference Signal Pattern Design Criteria for Integrated Communication and Sensing

Authors: Rui Zhang, Shawn Tsai, Tzu-Han Chou, Jiaying Ren, Wenze Qu, Oliver Sun

Abstract: Ambiguity performance, which indicates the maximum detectable region for target parameter estimation, is critical to radar sensor design. Driven by ambiguity performance requirements of bi-static sensing, we propose design criteria for orthogonal frequency division multiplexing (OFDM) reference signal (RS) patterns. The design not only reduces ambiguities in both time delay and Doppler shift domai… ▽ More Ambiguity performance, which indicates the maximum detectable region for target parameter estimation, is critical to radar sensor design. Driven by ambiguity performance requirements of bi-static sensing, we propose design criteria for orthogonal frequency division multiplexing (OFDM) reference signal (RS) patterns. The design not only reduces ambiguities in both time delay and Doppler shift domains under different types of sensing algorithms, but also reduces resource overhead for integrated comunication and sensing. With minimal modifications of post-FFT processing for current RS patterns, guard interval is extended beyond conventional cyclic prefix (CP), while maintaining inter-symbol-interference-(ISI)-free delay estimation. For standard-resolution sensing algorithms, a staggering offset of a linear slope that is relatively prime to the RS comb size is suggested. As for high-resolution sensing algorithms, necessary and sufficient conditions of comb RS staggering offsets, plus new patterns synthesized therefrom, are derived for the corresponding achievable ambiguity performance. Furthermore, we generalize the RS pattern design criterion for high-resolution sensing algorithms to irregular forms, which minimizes number of resource elements (REs) for associated algorithms to eliminate all side peaks. Starting from staggered comb pattern in current positioning RS, our generalized design eventually removes any regular form for ultimate flexibility. Overall, the proposed techniques are promising to extend the ISI- and ambiguity-free range of distance and speed estimates for radar sensing. △ Less

Submitted 25 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.02566 [pdf]

Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

Authors: Xiaoquan Li, Stephan Weiss, Yijun Yan, Yinhe Li, Jinchang Ren, John Soraghan, Ming Gong

Abstract: Understanding and identifying musical shape plays an important role in music education and performance assessment. To simplify the otherwise time- and cost-intensive musical shape evaluation, in this paper we explore how artificial intelligence (AI) driven models can be applied. Considering musical shape evaluation as a classification problem, a light-weight Siamese residual neural network (S-ResN… ▽ More Understanding and identifying musical shape plays an important role in music education and performance assessment. To simplify the otherwise time- and cost-intensive musical shape evaluation, in this paper we explore how artificial intelligence (AI) driven models can be applied. Considering musical shape evaluation as a classification problem, a light-weight Siamese residual neural network (S-ResNN) is proposed to automatically identify musical shapes. To assess the proposed approach in the context of piano musical shape evaluation, we have generated a new dataset, containing 4116 music pieces derived by 147 piano preparatory exercises and performed in 28 categories of musical shapes. The experimental results show that the S-ResNN significantly outperforms a number of benchmark methods in terms of the precision, recall and F1 score. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: X.Li, S.Weiss, Y.Yan, Y.Li, J.Ren, J.Soraghan, M.Gong,"Siamese residual neural network for musical shape evaluation in piano performance assessment" in Proc. of the 31st European Signal Processing Conference, Helsinki, Finland

arXiv:2311.09655 [pdf, other]

Multi-View Spectrogram Transformer for Respiratory Sound Classification

Authors: Wentao He, Yuchen Yan, Jianfeng Ren, Ruibin Bai, Xudong Jiang

Abstract: Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MV… ▽ More Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MVST splits the mel-spectrogram into different sized patches, representing the multi-view acoustic elements of a respiratory sound. These patches and positional embeddings are then fed into transformer encoders to extract the attentional information among patches through a self-attention mechanism. Finally, a gated fusion scheme is designed to automatically weigh the multi-view features to highlight the best one in a specific scenario. Experimental results on the ICBHI dataset demonstrate that the proposed MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds. △ Less

Submitted 30 May, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: The paper was published at ICASSP 2024

arXiv:2310.02550 [pdf, ps, other]

doi 10.1109/TGCN.2023.3309657

Convergence Analysis and Latency Minimization for Semi-Federated Learning in Massive IoT Networks

Authors: Jianyang Ren, Wanli Ni, Hui Tian, Gaofeng Nie

Abstract: As the number of sensors becomes massive in Internet of Things (IoT) networks, the amount of data is humongous. To process data in real-time while protecting user privacy, federated learning (FL) has been regarded as an enabling technique to push edge intelligence into IoT networks with massive devices. However, FL latency increases dramatically due to the increase of the number of parameters in d… ▽ More As the number of sensors becomes massive in Internet of Things (IoT) networks, the amount of data is humongous. To process data in real-time while protecting user privacy, federated learning (FL) has been regarded as an enabling technique to push edge intelligence into IoT networks with massive devices. However, FL latency increases dramatically due to the increase of the number of parameters in deep neural network and the limited computation and communication capabilities of IoT devices. To address this issue, we propose a semi-federated learning (SemiFL) paradigm in which network pruning and over-the-air computation are efficiently applied. To be specific, each small base station collects the raw data from its served sensors and trains its local pruned model. After that, the global aggregation of local gradients is achieved through over-the-air computation. We first analyze the performance of the proposed SemiFL by deriving its convergence upper bound. To reduce latency, a convergence-constrained SemiFL latency minimization problem is formulated. By decoupling the original problem into several sub-problems, iterative algorithms are designed to solve them efficiently. Finally, numerical simulations are conducted to verify the effectiveness of our proposed scheme in reducing latency and guaranteeing the identification accuracy. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: This paper has been accepted by IEEE Transactions on Green Communications and Networking

arXiv:2309.04084 [pdf, other]

Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation

Authors: Xiangyu Chen, Zheyuan Li, Zhengwen Zhang, Jimmy S. Ren, Yihao Liu, Jingwen He, Yu Qiao, Jiantao Zhou, Chao Dong

Abstract: Modern displays are capable of rendering video content with high dynamic range (HDR) and wide color gamut (WCG). However, the majority of available resources are still in standard dynamic range (SDR). As a result, there is significant value in transforming existing SDR content into the HDRTV standard. In this paper, we define and analyze the SDRTV-to-HDRTV task by modeling the formation of SDRTV/H… ▽ More Modern displays are capable of rendering video content with high dynamic range (HDR) and wide color gamut (WCG). However, the majority of available resources are still in standard dynamic range (SDR). As a result, there is significant value in transforming existing SDR content into the HDRTV standard. In this paper, we define and analyze the SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV content. Our analysis and observations indicate that a naive end-to-end supervised training pipeline suffers from severe gamut transition errors. To address this issue, we propose a novel three-step solution pipeline called HDRTVNet++, which includes adaptive global color mapping, local enhancement, and highlight refinement. The adaptive global color mapping step uses global statistics as guidance to perform image-adaptive color mapping. A local enhancement network is then deployed to enhance local details. Finally, we combine the two sub-networks above as a generator and achieve highlight consistency through GAN-based joint training. Our method is primarily designed for ultra-high-definition TV content and is therefore effective and lightweight for processing 4K resolution images. We also construct a dataset using HDR videos in the HDR10 standard, named HDRTV1K that contains 1235 and 117 training images and 117 testing images, all in 4K resolution. Besides, we select five metrics to evaluate the results of SDRTV-to-HDRTV algorithms. Our final results demonstrate state-of-the-art performance both quantitatively and visually. The code, model and dataset are available at https://github.com/xiaom233/HDRTVNet-plus. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Extended version of HDRTVNet

arXiv:2307.00511 [pdf]

SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration

Authors: Jianxun Ren, Ning An, Youjia Zhang, Danyang Wang, Zhenyu Sun, Cong Lin, Weigang Cui, Weiwei Wang, Ying Zhou, Wei Zhang, Qingyu Hu, Ping Zhang, Dan Hu, Danhong Wang, Hesheng Liu

Abstract: Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a lea… ▽ More Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.15530 [pdf, other]

Fast and Automatic 3D Modeling of Antenna Structure Using CNN-LSTM Network for Efficient Data Generation

Authors: Zhaohui Wei, Zhao Zhou, Peng Wang, Jian Ren, Yingzeng Yin, Gert Frølund Pedersen, Ming Shen

Abstract: Deep learning-assisted antenna design methods such as surrogate models have gained significant popularity in recent years due to their potential to greatly increase design efficiencies by replacing the time-consuming full-wave electromagnetic (EM) simulations. However, a large number of training data with sufficiently diverse and representative samples (antenna structure parameters, scattering pro… ▽ More Deep learning-assisted antenna design methods such as surrogate models have gained significant popularity in recent years due to their potential to greatly increase design efficiencies by replacing the time-consuming full-wave electromagnetic (EM) simulations. However, a large number of training data with sufficiently diverse and representative samples (antenna structure parameters, scattering properties, etc.) is mandatory for these methods to ensure good performance. Traditional antenna modeling methods relying on manual model construction and modification are time-consuming and cannot meet the requirement of efficient training data acquisition. In this study, we proposed a deep learning-assisted and image-based intelligent modeling approach for accelerating the data acquisition of antenna samples with different physical structures. Specifically, our method only needs an image of the antenna structure, usually available in scientific publications, as the input while the corresponding modeling codes (VBA language) are generated automatically. The proposed model mainly consists of two parts: Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) networks. The former is used for capturing features of antenna structure images and the latter is employed to generate the modeling codes. Through training, the proposed model can achieve fast and automatic data acquisition of antenna physical structures based on antenna images. Experiment results show that the proposed method achieves a significant speed enhancement than the manual modeling approach. This approach lays the foundation for efficient data acquisition needed to build robust surrogate models in the future. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.04360 [pdf, other]

doi 10.1109/TAP.2022.3179898

Robust and Efficient Fault Diagnosis of mm-Wave Active Phased Arrays using Baseband Signal

Authors: Martin H. Nielsen, Yufeng Zhang, Changbin Xue, Jian Ren, Yingzeng Yin, Ming Shen, Gert F. Pedersen

Abstract: One key communication block in 5G and 6G radios is the active phased array (APA). To ensure reliable operation, efficient and timely fault diagnosis of APAs on-site is crucial. To date, fault diagnosis has relied on measurement of frequency domain radiation patterns using costly equipment and multiple strictly controlled measurement probes, which are time-consuming, complex, and therefore infeasib… ▽ More One key communication block in 5G and 6G radios is the active phased array (APA). To ensure reliable operation, efficient and timely fault diagnosis of APAs on-site is crucial. To date, fault diagnosis has relied on measurement of frequency domain radiation patterns using costly equipment and multiple strictly controlled measurement probes, which are time-consuming, complex, and therefore infeasible for on-site deployment. This paper proposes a novel method exploiting a Deep Neural Network (DNN) tailored to extract the features hidden in the baseband in-phase and quadrature signals for classifying the different faults. It requires only a single probe in one measurement point for fast and accurate diagnosis of the faulty elements and components in APAs. Validation of the proposed method is done using a commercial 28 GHz APA. Accuracies of 99% and 80% have been demonstrated for single- and multi-element failure detection, respectively. Three different test scenarios are investigated: on-off antenna elements, phase variations, and magnitude attenuation variations. In a low signal to noise ratio of 4 dB, stable fault detection accuracy above 90% is maintained. This is all achieved with a detection time of milliseconds (e.g 6~ms), showing a high potential for on-site deployment. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 10 pages

Journal ref: in IEEE Transactions on Antennas and Propagation, vol. 70, no. 7, pp. 5044-5053, July 2022

arXiv:2306.01002 [pdf, other]

doi 10.1016/j.oceaneng.2022.112626

Adaptive ship-radiated noise recognition with learnable fine-grained wavelet transform

Authors: Yuan Xie, Jiawei Ren, Ji Xu

Abstract: Analyzing the ocean acoustic environment is a tricky task. Background noise and variable channel transmission environment make it complicated to implement accurate ship-radiated noise recognition. Existing recognition systems are weak in addressing the variable underwater environment, thus leading to disappointing performance in practical application. In order to keep the recognition system robust… ▽ More Analyzing the ocean acoustic environment is a tricky task. Background noise and variable channel transmission environment make it complicated to implement accurate ship-radiated noise recognition. Existing recognition systems are weak in addressing the variable underwater environment, thus leading to disappointing performance in practical application. In order to keep the recognition system robust in various underwater environments, this work proposes an adaptive generalized recognition system - AGNet (Adaptive Generalized Network). By converting fixed wavelet parameters into fine-grained learnable parameters, AGNet learns the characteristics of underwater sound at different frequencies. Its flexible and fine-grained design is conducive to capturing more background acoustic information (e.g., background noise, underwater transmission channel). To utilize the implicit information in wavelet spectrograms, AGNet adopts the convolutional neural network with parallel convolution attention modules as the classifier. Experiments reveal that our AGNet outperforms all baseline methods on several underwater acoustic datasets, and AGNet could benefit more from transfer learning. Moreover, AGNet shows robust performance against various interference factors. △ Less

Submitted 19 February, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

Journal ref: Ocean Engineering 265 (2022): 112626

arXiv:2305.19612 [pdf, other]

doi 10.1121/10.0015053

Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition

Authors: Yuan Xie, Jiawei Ren, Ji Xu

Abstract: Underwater acoustic target recognition is an intractable task due to the complex acoustic source characteristics and sound propagation patterns. Limited by insufficient data and narrow information perspective, recognition models based on deep learning seem far from satisfactory in practical underwater scenarios. Although underwater acoustic signals are severely influenced by distance, channel dept… ▽ More Underwater acoustic target recognition is an intractable task due to the complex acoustic source characteristics and sound propagation patterns. Limited by insufficient data and narrow information perspective, recognition models based on deep learning seem far from satisfactory in practical underwater scenarios. Although underwater acoustic signals are severely influenced by distance, channel depth, or other factors, annotations of relevant information are often non-uniform, incomplete, and hard to use. In our work, we propose to implement Underwater Acoustic Recognition based on Templates made up of rich relevant information (hereinafter called "UART"). We design templates to integrate relevant information from different perspectives into descriptive natural language. UART adopts an audio-spectrogram-text tri-modal contrastive learning framework, which endows UART with the ability to guide the learning of acoustic representations by descriptive natural language. Our experiments reveal that UART has better recognition capability and generalization performance than traditional paradigms. Furthermore, the pre-trained UART model could provide superior prior knowledge for the recognition model in the scenario without any auxiliary annotation. △ Less

Submitted 19 February, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Journal ref: The Journal of the Acoustical Society of America, 2022, 152(5): 2641-2651

arXiv:2305.11614 [pdf, other]

Two-Bit RIS-Aided Communications at 3.5GHz: Some Insights from the Measurement Results Under Multiple Practical Scenes

Authors: Shun Zhang, Haoran Sun, Runze Yu, Hongshenyuan Cui, Jian Ren, Feifei Gao, Shi Jin, Hongxiang Xie, Hao Wang

Abstract: In this paper, we propose a two-bit reconfigurable intelligent surface (RIS)-aided communication system, which mainly consists of a two-bit RIS, a transmitter and a receiver. A corresponding prototype verification system is designed to perform experimental tests in practical environments. The carrier frequency is set as 3.5GHz, and the RIS array possesses 256 units, each of which adopts two-bit ph… ▽ More In this paper, we propose a two-bit reconfigurable intelligent surface (RIS)-aided communication system, which mainly consists of a two-bit RIS, a transmitter and a receiver. A corresponding prototype verification system is designed to perform experimental tests in practical environments. The carrier frequency is set as 3.5GHz, and the RIS array possesses 256 units, each of which adopts two-bit phase quantization. In particular, we adopt a self-developed broadband intelligent communication system 40MHz-Net (BICT-40N) terminal in order to fully acquire the channel information. The terminal mainly includes a baseband board and a radio frequency (RF) front-end board, where the latter can achieve 26 dB transmitting link gain and 33 dB receiving link gain. The orthogonal frequency division multiplexing (OFDM) signal is used for the terminal, where the bandwidth is 40MHz and the subcarrier spacing is 625KHz. Also, the terminal supports a series of modulation modes, including QPSK, QAM, etc.Through experimental tests, we validate a few functions and properties of the RIS as follows. First, we validate a novel RIS power consumption model, which considers both the static and the dynamic power consumption. Besides, we demonstrate the existence of the imaging interference and find that two-bit RIS can lower the imaging interference about 10 dBm. Moreover, we verify that the RIS can outperform the metal plate in terms of the beam focusing performance. In addition, we find that the RIS has the ability to improve the channel stationarity. Then, we realize the multi-beam reflection of the RIS utilizing the pattern addition (PA) algorithm. Lastly, we validate the existence of the mutual coupling between different RIS units. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2207.05032 [pdf, other]

Computer Vision-Aided Reconfigurable Intelligent Surface-Based Beam Tracking: Prototyping and Experimental Results

Authors: Ming Ouyang, Yucong Wang, Feifei Gao, Shun Zhang, Puchu Li, Jian Ren

Abstract: In this paper, we propose a novel computer vision-based approach to aid Reconfigurable Intelligent Surface (RIS) for dynamic beam tracking and then implement the corresponding prototype verification system. A camera is attached at the RIS to obtain the visual information about the surrounding environment, with which RIS identifies the desired reflected beam direction and then adjusts the reflectio… ▽ More In this paper, we propose a novel computer vision-based approach to aid Reconfigurable Intelligent Surface (RIS) for dynamic beam tracking and then implement the corresponding prototype verification system. A camera is attached at the RIS to obtain the visual information about the surrounding environment, with which RIS identifies the desired reflected beam direction and then adjusts the reflection coefficients according to the pre-designed codebook. Compared to the conventional approaches that utilize channel estimation or beam sweeping to obtain the reflection coefficients, the proposed one not only saves beam training overhead but also eliminates the requirement for extra feedback links. We build a 20-by-20 RIS running at 5.4 GHz and develop a high-speed control board to ensure the real-time refresh of the reflection coefficients. Meanwhile we implement an independent peer-to-peer communication system to simulate the communication between the base station and the user equipment. The vision-aided RIS prototype system is tested in two mobile scenarios: RIS works in near-field conditions as a passive array antenna of the base station; RIS works in far-field conditions to assist the communication between the base station and the user equipment. The experimental results show that RIS can quickly adjust the reflection coefficients for dynamic beam tracking with the help of visual information. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2206.11695 [pdf, other]

NTIRE 2022 Challenge on Perceptual Image Quality Assessment

Authors: Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Radu Timofte

Abstract: This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms. The output images of these algorithms have completely different characteristics fro… ▽ More This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms. The output images of these algorithms have completely different characteristics from traditional distortions and are included in the PIPAL dataset used in this challenge. This challenge is divided into two tracks, a full-reference IQA track similar to the previous NTIRE IQA challenge and a new track that focuses on the no-reference IQA methods. The challenge has 192 and 179 registered participants for two tracks. In the final testing stage, 7 and 8 participating teams submitted their models and fact sheets. Almost all of them have achieved better results than existing IQA methods, and the winning method can demonstrate state-of-the-art performance. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: This report has been published in CVPR 2022 NTIRE workshop. arXiv admin note: text overlap with arXiv:2105.03072

arXiv:2205.14271 [pdf, other]

doi 10.1109/LCOMM.2022.3174295

Towards Communication-Learning Trade-off for Federated Learning at the Network Edge

Authors: Jianyang Ren, Wanli Ni, Hui Tian

Abstract: In this letter, we study a wireless federated learning (FL) system where network pruning is applied to local users with limited resources. Although pruning is beneficial to reduce FL latency, it also deteriorates learning performance due to the information loss. Thus, a trade-off problem between communication and learning is raised. To address this challenge, we quantify the effects of network pru… ▽ More In this letter, we study a wireless federated learning (FL) system where network pruning is applied to local users with limited resources. Although pruning is beneficial to reduce FL latency, it also deteriorates learning performance due to the information loss. Thus, a trade-off problem between communication and learning is raised. To address this challenge, we quantify the effects of network pruning and packet error on the learning performance by deriving the convergence rate of FL with a non-convex loss function. Then, closed-form solutions for pruning control and bandwidth allocation are proposed to minimize the weighted sum of FL latency and FL performance. Finally, numerical results demonstrate that 1) our proposed solution can outperform benchmarks in terms of cost reduction and accuracy guarantee, and 2) a higher pruning rate would bring less communication overhead but also worsen FL accuracy, which is consistent with our theoretical analysis. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: This paper has been accepted by IEEE Communications Letters

Journal ref: IEEE Communications Letters, 2022

arXiv:2201.01166 [pdf, other]

Deep Learning-based Predictive Control of Battery Management for Frequency Regulation

Authors: Yun Li, Yixiu Wang, Yifu Chen, Kaixun Hua, Jiayang Ren, Ghazaleh Mozafari, Qiugang Lu, Yankai Cao

Abstract: This paper proposes a deep learning-based optimal battery management scheme for frequency regulation (FR) by integrating model predictive control (MPC), supervised learning (SL), reinforcement learning (RL), and high-fidelity battery models. By taking advantage of deep neural networks (DNNs), the derived DNN-approximated policy is computationally efficient in online implementation. The design proc… ▽ More This paper proposes a deep learning-based optimal battery management scheme for frequency regulation (FR) by integrating model predictive control (MPC), supervised learning (SL), reinforcement learning (RL), and high-fidelity battery models. By taking advantage of deep neural networks (DNNs), the derived DNN-approximated policy is computationally efficient in online implementation. The design procedure of the proposed scheme consists of two sequential processes: (1) the SL process, in which we first run a simulation with an MPC embedding a low-fidelity battery model to generate a training data set, and then, based on the generated data set, we optimize a DNN-approximated policy using SL algorithms; and (2) the RL process, in which we utilize RL algorithms to improve the performance of the DNN-approximated policy by balancing short-term economic incentives and long-term battery degradation. The SL process speeds up the subsequent RL process by providing a good initialization. By utilizing RL algorithms, one prominent property of the proposed scheme is that it can learn from the data generated by simulating the FR policy on the high-fidelity battery simulator to adjust the DNN-approximated policy, which is originally based on low-fidelity battery model. A case study using real-world data of FR signals and prices is performed. Simulation results show that, compared to conventional MPC schemes, the proposed deep learning-based scheme can effectively achieve higher economic benefits of FR participation while maintaining lower online computational cost. △ Less

Submitted 4 January, 2022; originally announced January 2022.

Comments: 30 pages, 5 figures, 2 tables

arXiv:2111.12869 [pdf, other]

Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

Authors: Wangkai Jin, Junyu Liu, Jianfeng Ren, Xiangjun Peng

Abstract: The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this… ▽ More The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this end, we propose a novel PSED framework, which incorporates Multi-Type-Multi-Scale TFRs. Our key insight is that: TFRs, which are of different types or in different scales, can reveal acoustics patterns in a complementary manner, so that the overlapped events can be best extracted by combining different TFRs. Moreover, our framework design applies a novel approach, to adaptively fuse different models and TFRs symbiotically. Hence, the overall performance can be significantly improved. We quantitatively examine the benefits of our framework by using Capsule Neural Networks, a state-of-the-art approach for PSED. The experimental results show that our method achieves a reduction of 7\% in error rate compared with the state-of-the-art solutions on the TUT-SED 2016 dataset. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: Under reviewed in ICASSP 2022

arXiv:2111.12290 [pdf, other]

Attention-based Dual-stream Vision Transformer for Radar Gait Recognition

Authors: Shiliang Chen, Wentao He, Jianfeng Ren, Xudong Jiang

Abstract: Radar gait recognition is robust to light variations and less infringement on privacy. Previous studies often utilize either spectrograms or cadence velocity diagrams. While the former shows the time-frequency patterns, the latter encodes the repetitive frequency patterns. In this work, a dual-stream neural network with attention-based fusion is proposed to fully aggregate the discriminant informa… ▽ More Radar gait recognition is robust to light variations and less infringement on privacy. Previous studies often utilize either spectrograms or cadence velocity diagrams. While the former shows the time-frequency patterns, the latter encodes the repetitive frequency patterns. In this work, a dual-stream neural network with attention-based fusion is proposed to fully aggregate the discriminant information from these two representations. The both streams are designed based on the Vision Transformer, which well captures the gait characteristics embedded in these representations. The proposed method is validated on a large benchmark dataset for radar gait recognition, which shows that it significantly outperforms state-of-the-art solutions. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: Under review

arXiv:2110.09662 [pdf, other]

Osteoporosis Prescreening using Panoramic Radiographs through a Deep Convolutional Neural Network with Attention Mechanism

Authors: Heng Fan, Jiaxiang Ren, Jie Yang, Yi-Xian Qin, Haibin Ling

Abstract: Objectives. The aim of this study was to investigate whether a deep convolutional neural network (CNN) with an attention module can detect osteoporosis on panoramic radiographs. Study Design. A dataset of 70 panoramic radiographs (PRs) from 70 different subjects of age between 49 to 60 was used, including 49 subjects with osteoporosis and 21 normal subjects. We utilized the leave-one-out cross-v… ▽ More Objectives. The aim of this study was to investigate whether a deep convolutional neural network (CNN) with an attention module can detect osteoporosis on panoramic radiographs. Study Design. A dataset of 70 panoramic radiographs (PRs) from 70 different subjects of age between 49 to 60 was used, including 49 subjects with osteoporosis and 21 normal subjects. We utilized the leave-one-out cross-validation approach to generate 70 training and test splits. Specifically, for each split, one image was used for testing and the remaining 69 images were used for training. A deep convolutional neural network (CNN) using the Siamese architecture was implemented through a fine-tuning process to classify an PR image using patches extracted from eight representative trabecula bone areas (Figure 1). In order to automatically learn the importance of different PR patches, an attention module was integrated into the deep CNN. Three metrics, including osteoporosis accuracy (OPA), non-osteoporosis accuracy (NOPA) and overall accuracy (OA), were utilized for performance evaluation. Results. The proposed baseline CNN approach achieved the OPA, NOPA and OA scores of 0.667, 0.878 and 0.814, respectively. With the help of the attention module, the OPA, NOPA and OA scores were further improved to 0.714, 0.939 and 0.871, respectively. Conclusions. The proposed method obtained promising results using deep CNN with an attention module, which might be applied to osteoporosis prescreening. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: 9 pages

arXiv:2108.07978 [pdf, other]

A New Journey from SDRTV to HDRTV

Authors: Xiangyu Chen, Zhengwen Zhang, Jimmy S. Ren, Lynhoo Tian, Yu Qiao, Chao Dong

Abstract: Nowadays modern displays are capable to render video content with high dynamic range (HDR) and wide color gamut (WCG). However, most available resources are still in standard dynamic range (SDR). Therefore, there is an urgent demand to transform existing SDR-TV contents into their HDR-TV versions. In this paper, we conduct an analysis of SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV… ▽ More Nowadays modern displays are capable to render video content with high dynamic range (HDR) and wide color gamut (WCG). However, most available resources are still in standard dynamic range (SDR). Therefore, there is an urgent demand to transform existing SDR-TV contents into their HDR-TV versions. In this paper, we conduct an analysis of SDRTV-to-HDRTV task by modeling the formation of SDRTV/HDRTV content. Base on the analysis, we propose a three-step solution pipeline including adaptive global color mapping, local enhancement and highlight generation. Moreover, the above analysis inspires us to present a lightweight network that utilizes global statistics as guidance to conduct image-adaptive color mapping. In addition, we construct a dataset using HDR videos in HDR10 standard, named HDRTV1K, and select five metrics to evaluate the results of SDRTV-to-HDRTV algorithms. Furthermore, our final results achieve state-of-the-art performance in quantitative comparisons and visual quality. The code and dataset are available at https://github.com/chxy95/HDRTVNet. △ Less

Submitted 25 September, 2021; v1 submitted 18 August, 2021; originally announced August 2021.

Comments: Accepted to ICCV

arXiv:2107.02412 [pdf, ps, other]

GBLinks: GNN-Based Beam Selection and Link Activation for Ultra-dense D2D mmWave Networks

Authors: S. He, S. Xiong, W. Zhang, Y. Yang, J. Ren, Y. Huang

Abstract: In this paper, we consider the problem of joint beam selection and link activation across a set of communication pairs to effectively control the interference between communication pairs via inactivating part communication pairs in ultra-dense device-to-device (D2D) mmWave communication networks. The resulting optimization problem is formulated as an integer programming problem that is nonconvex a… ▽ More In this paper, we consider the problem of joint beam selection and link activation across a set of communication pairs to effectively control the interference between communication pairs via inactivating part communication pairs in ultra-dense device-to-device (D2D) mmWave communication networks. The resulting optimization problem is formulated as an integer programming problem that is nonconvex and NP-hard. Consequently, the global optimal solution, even the local optimal solution, cannot be generally obtained. To overcome this challenge, this paper resorts to design a deep learning architecture based on graph neural network to finish the joint beam selection and link activation, with taking the network topology information into account. Meanwhile, we present an unsupervised Lagrangian dual learning framework to train the parameters of the GBLinks model. Numerical results show that the proposed GBLinks model can converges to a stable point with the number of iterations increases, in terms of the weighted sum rate. Furthermore, the GBLinks model can reach near-optimal solution through comparing with the exhaustive search scheme in small-scale ultra-dense D2D mmWave communication networks and outperforms GreedyNoSched and the SCA-based method. It also shows that the GBLinks model can generalize to varying densities and coverage regions of ultra-dense D2D mmWave communication networks. △ Less

Submitted 29 December, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

Comments: 31 pages, 9 figures, submitted to IEEE Trans. on Commun., July 2021, major revised in Dec. 2021

arXiv:2105.14758 [pdf, other]

Low-Dose CT Denoising Using a Structure-Preserving Kernel Prediction Network

Authors: Lu Xu, Yuwei Zhang, Ying Liu, Daoye Wang, Mu Zhou, Jimmy Ren, Jingwei Wei, Zhaoxiang Ye

Abstract: Low-dose CT has been a key diagnostic imaging modality to reduce the potential risk of radiation overdose to patient health. Despite recent advances, CNN-based approaches typically apply filters in a spatially invariant way and adopt similar pixel-level losses, which treat all regions of the CT image equally and can be inefficient when fine-grained structures coexist with non-uniformly distributed… ▽ More Low-dose CT has been a key diagnostic imaging modality to reduce the potential risk of radiation overdose to patient health. Despite recent advances, CNN-based approaches typically apply filters in a spatially invariant way and adopt similar pixel-level losses, which treat all regions of the CT image equally and can be inefficient when fine-grained structures coexist with non-uniformly distributed noises. To address this issue, we propose a Structure-preserving Kernel Prediction Network (StructKPN) that combines the kernel prediction network with a structure-aware loss function that utilizes the pixel gradient statistics and guides the model towards spatially-variant filters that enhance noise removal, prevent over-smoothing and preserve detailed structures for different regions in CT imaging. Extensive experiments demonstrated that our approach achieved superior performance on both synthetic and non-synthetic datasets, and better preserves structures that are highly desired in clinical screening and low-dose protocol optimization. △ Less

Submitted 23 July, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: ICIP2021

arXiv:2105.03072 [pdf, other]

NTIRE 2021 Challenge on Perceptual Image Quality Assessment

Authors: Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance. △ Less

Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

arXiv:2104.09177 [pdf, ps, other]

Research on Resource Allocation for Efficient Federated Learning

Authors: Jianyang Ren, Wanli Ni, Gaofeng Nie, Hui Tian

Abstract: As a promising solution to achieve efficient learning among isolated data owners and solve data privacy issues, federated learning is receiving wide attention. Using the edge server as an intermediary can effectively collect sensor data, perform local model training, and upload model parameters for global aggregation. So this paper proposes a new framework for resource allocation in a hierarchical… ▽ More As a promising solution to achieve efficient learning among isolated data owners and solve data privacy issues, federated learning is receiving wide attention. Using the edge server as an intermediary can effectively collect sensor data, perform local model training, and upload model parameters for global aggregation. So this paper proposes a new framework for resource allocation in a hierarchical network supported by edge computing. In this framework, we minimize the weighted sum of system cost and learning cost by optimizing bandwidth, computing frequency, power allocation and subcarrier assignment. To solve this challenging mixed-integer non-linear problem, we first decouple the bandwidth optimization problem(P1) from the whole problem and obtain a closed-form solution. The remaining computational frequency, power, and subcarrier joint optimization problem(P2) can be further decomposed into two sub-problems: latency and computational frequency optimization problem(P3) and transmission power and subcarrier optimization problem(P4). P3 is a convex optimization problem that is easy to solve. In the joint optimization problem(P4), the optimal power under each subcarrier selection can be obtained first through the successive convex approximation(SCA) algorithm. Substituting the optimal power value obtained back to P4, the subproblem can be regarded as an assignment problem, so the Hungarian algorithm can be effectively used to solve it. The solution of problem P2 is accomplished by solving P3 and P4 iteratively. To verify the performance of the algorithm, we compare the proposed algorithm with five algorithms; namely Equal bandwidth allocation, Learning cost guaranteed, Greedy subcarrier allocation, System cost guaranteed and Time-biased algorithm. Numerical results show the significant performance gain and the robustness of the proposed algorithm in the face of parameter changes. △ Less

Submitted 12 September, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

Comments: 14 pages, 13 figures

arXiv:2103.11500 [pdf, other]

Sinusoidal Parameter Estimation from Signed Measurements via Majorization-Minimization Based RELAX

Authors: Jiaying Ren, Tianyi Zhang, Jian Li, Petre Stoica

Abstract: We consider the problem of sinusoidal parameter estimation using signed observations obtained via one-bit sampling with fixed as well as time-varying thresholds. In a previous paper, a relaxation-based algorithm, referred to as 1bRELAX, has been proposed to iteratively maximize the likelihood function. However, the exhaustive search procedure used in each iteration of 1bRELAX is time-consuming. In… ▽ More We consider the problem of sinusoidal parameter estimation using signed observations obtained via one-bit sampling with fixed as well as time-varying thresholds. In a previous paper, a relaxation-based algorithm, referred to as 1bRELAX, has been proposed to iteratively maximize the likelihood function. However, the exhaustive search procedure used in each iteration of 1bRELAX is time-consuming. In this paper, we present a majorization-minimization (MM) based 1bRELAX algorithm, referred to as 1bMMRELAX, to enhance the computational efficiency of 1bRELAX. Using the MM technique, 1bMMRELAX maximizes the likelihood function iteratively using simple FFT operations instead of the more computationally intensive search used by 1bRELAX. Both simulated and experimental results are presented to show that 1bMMRELAX can significantly reduce the computational cost of 1bRELAX while maintaining its excellent estimation accuracy. △ Less

Submitted 21 March, 2021; originally announced March 2021.

arXiv:2103.10827 [pdf, other]

Joint RFI Mitigation and Radar Echo Recovery for One-Bit UWB Radar

Authors: Tianyi Zhang, Jiaying Ren, Jian Li, Lam H. Nguyen, Petre Stoica

Abstract: Radio frequency interference (RFI) mitigation and radar echo recovery are critically important for the proper functioning of ultra-wideband (UWB) radar systems using one-bit sampling techniques. We recently introduced a technique for one-bit UWB radar, which first uses a majorization-minimization method for RFI parameter estimation followed by a sparse method for radar echo recovery. However, this… ▽ More Radio frequency interference (RFI) mitigation and radar echo recovery are critically important for the proper functioning of ultra-wideband (UWB) radar systems using one-bit sampling techniques. We recently introduced a technique for one-bit UWB radar, which first uses a majorization-minimization method for RFI parameter estimation followed by a sparse method for radar echo recovery. However, this technique suffers from high computational complexity due to the need to estimate the parameters of each RFI source separately and iteratively. In this paper, we present a computationally efficient joint RFI mitigation and radar echo recovery framework to greatly reduce the computational cost. Specifically, we exploit the sparsity of RFI in the fast-frequency domain and the sparsity of radar echoes in the fast-time domain to design a one-bit weighted SPICE (SParse Iterative Covariance-based Estimation) based framework for the joint RFI mitigation and radar echo recovery of one-bit UWB radar. Both simulated and experimental results are presented to show that the proposed one-bit weighted SPICE framework can not only reduce the computational cost but also outperform the existing approach for decoupled RFI mitigation and radar echo recovery of one-bit UWB radar. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Comments: arXiv admin note: text overlap with arXiv:2102.08987

arXiv:2103.01624 [pdf, other]

Efficient Deep Image Denoising via Class Specific Convolution

Authors: Lu Xu, Jiawei Zhang, Xuanye Cheng, Feng Zhang, Xing Wei, Jimmy Ren

Abstract: Deep neural networks have been widely used in image denoising during the past few years. Even though they achieve great success on this problem, they are computationally inefficient which makes them inappropriate to be implemented in mobile devices. In this paper, we propose an efficient deep neural network for image denoising based on pixel-wise classification. Despite using a computationally eff… ▽ More Deep neural networks have been widely used in image denoising during the past few years. Even though they achieve great success on this problem, they are computationally inefficient which makes them inappropriate to be implemented in mobile devices. In this paper, we propose an efficient deep neural network for image denoising based on pixel-wise classification. Despite using a computationally efficient network cannot effectively remove the noises from any content, it is still capable to denoise from a specific type of pattern or texture. The proposed method follows such a divide and conquer scheme. We first use an efficient U-net to pixel-wisely classify pixels in the noisy image based on the local gradient statistics. Then we replace part of the convolution layers in existing denoising networks by the proposed Class Specific Convolution layers (CSConv) which use different weights for different classes of pixels. Quantitative and qualitative evaluations on public datasets demonstrate that the proposed method can reduce the computational costs without sacrificing the performance compared to state-of-the-art algorithms. △ Less

Submitted 4 August, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

Comments: The Thirty-Fifth AAAI Conference on Artificial Intelligence(AAAI-21)

arXiv:2102.08987 [pdf, other]

RFI Mitigation for One-bit UWB Radar Systems

Authors: Tianyi Zhang, Jiaying Ren, Jian Li, Lam H. Nguyen, Petre Stoica

Abstract: Radio frequency interference (RFI) mitigation is critical to the proper operation of ultra-wideband (UWB) radar systems since RFI can severely degrade the radar imaging capability and target detection performance. In this paper, we address the RFI mitigation problem for one-bit UWB radar systems. A one-bit UWB system obtains its signed measurements via a low-cost and high rate sampling scheme, ref… ▽ More Radio frequency interference (RFI) mitigation is critical to the proper operation of ultra-wideband (UWB) radar systems since RFI can severely degrade the radar imaging capability and target detection performance. In this paper, we address the RFI mitigation problem for one-bit UWB radar systems. A one-bit UWB system obtains its signed measurements via a low-cost and high rate sampling scheme, referred to as the Continuous Time Binary Value (CTBV) technology. This sampling strategy compares the signal to a known threshold varying with slow-time and therefore can be used to achieve a rather high sampling rate and quantization resolution with rather simple and affordable hardware. This paper establishes a proper data model for the RFI sources and proposes a novel RFI mitigation method for the one-bit UWB radar system that uses the CTBV sampling technique. Specifically, we first model the RFI sources as a sum of sinusoids with frequencies fixed during the coherent processing interval (CPI) and we exploit the sparsity of the RFI spectrum. We extend a majorization-minimization based 1bRELAX algorithm, referred to as 1bMMRELAX, to estimate the RFI source parameters from the signed measurements obtained by using the CTBV sampling strategy. We also devise a new fast frequency initialization method based on the Alternating Direction Method of Multipliers (ADMM) methodology for the extended 1bMMRELAX algorithm to significantly improve its computational efficiency. Moreover, an ADMM-based sparse method is introduced to recover the desired radar echoes using the estimated RFI parameters. Both simulated and experimental results are presented to demonstrate that our proposed algorithm outperforms the existing digital integration method, especially for severe RFI cases. △ Less

Submitted 25 March, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

arXiv:2012.01618 [pdf, other]

Matrix Completion Methods for the Total Electron Content Video Reconstruction

Authors: Hu Sun, Zhijun Hua, Jiaen Ren, Shasha Zou, Yuekai Sun, Yang Chen

Abstract: The total electron content (TEC) maps can be used to estimate the signal delay of GPS due to the ionospheric electron content between a receiver and satellite. This delay can result in GPS positioning error. Thus it is important to monitor the TEC maps. The observed TEC maps have big patches of missingness in the ocean and scattered small areas of missingness on the land. In this paper, we propose… ▽ More The total electron content (TEC) maps can be used to estimate the signal delay of GPS due to the ionospheric electron content between a receiver and satellite. This delay can result in GPS positioning error. Thus it is important to monitor the TEC maps. The observed TEC maps have big patches of missingness in the ocean and scattered small areas of missingness on the land. In this paper, we propose several extensions of existing matrix completion algorithms to achieve TEC map reconstruction, accounting for spatial smoothness and temporal consistency while preserving important structures of the TEC maps. We call the proposed method Video Imputation with SoftImpute, Temporal smoothing and Auxiliary data (VISTA). Numerical simulations that mimic patterns of real data are given. We show that our proposed method achieves better reconstructed TEC maps as compared to existing methods in literature. Our proposed computational algorithm is general and can be readily applied for other problems besides TEC map reconstruction. △ Less

Submitted 12 January, 2022; v1 submitted 2 December, 2020; originally announced December 2020.

arXiv:2011.15002 [pdf, other]

Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric

Authors: Jinjin Gu, Haoming Cai, Haoyu Chen, Xiaoxing Ye, Jimmy Ren, Chao Dong

Abstract: Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent perceptual IR algorithms based on generative adversarial networks (GANs) have brought in significant improvement on visual performance, but also pose great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality a… ▽ More Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent perceptual IR algorithms based on generative adversarial networks (GANs) have brought in significant improvement on visual performance, but also pose great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. We present two questions: Can existing IQA methods objectively evaluate recent IR algorithms? With the focus on beating current benchmarks, are we getting better IR algorithms? To answer the questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing ALgorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based IR algorithms, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable Elo system. Based on PIPAL, we present new benchmarks for both IQA and SR methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we shed light on how to improve the IQA performance on GAN-based distortion. Inspired by the find that the existing IQA methods have an unsatisfactory performance on the GAN-based distortion partially because of their low tolerance to spatial misalignment, we propose to improve the performance of an IQA network on GAN-based distortion by explicitly considering this misalignment. We propose the Space Warping Difference Network, which includes the novel l_2 pooling layers and Space Warping Difference layers. Experiments demonstrate the effectiveness of the proposed method. △ Less

Submitted 30 November, 2020; originally announced November 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.12142

arXiv:2007.12142 [pdf, other]

PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration

Authors: Jinjin Gu, Haoming Cai, Haoyu Chen, Xiaoxing Ye, Jimmy Ren, Chao Dong

Abstract: Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the eval… ▽ More Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. Then we raise two questions: (1) Can existing IQA methods objectively evaluate recent IR algorithms? (2) When focus on beating current benchmarks, are we getting better IR algorithms? To answer these questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing Algorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based methods, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable "Elo system". Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we improve the performance of IQA networks on GAN-based distortions by introducing anti-aliasing pooling. Experiments show the effectiveness of the proposed method. △ Less

Submitted 26 September, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: This paper has been accepted for publication at ECCV2020

Showing 1–50 of 66 results for author: Ren, J