Search | arXiv e-print repository

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data

Authors: Yichen Lu, Jiaqi Song, Xuankai Chang, Hengwei Bian, Soumi Maiti, Shinji Watanabe

Abstract: In this work, we present SynesLM, an unified model which can perform three multimodal language understanding tasks: audio-visual automatic speech recognition(AV-ASR) and visual-aided speech/machine translation(VST/VMT). Unlike previous research that focused on lip motion as visual cues for speech signals, our work explores more general visual information within entire frames, such as objects and a… ▽ More In this work, we present SynesLM, an unified model which can perform three multimodal language understanding tasks: audio-visual automatic speech recognition(AV-ASR) and visual-aided speech/machine translation(VST/VMT). Unlike previous research that focused on lip motion as visual cues for speech signals, our work explores more general visual information within entire frames, such as objects and actions. Additionally, we use synthetic image data to enhance the correlation between image and speech data. We benchmark SynesLM against the How2 dataset, demonstrating performance on par with state-of-the-art (SOTA) models dedicated to AV-ASR while maintaining our multitasking framework. Remarkably, for zero-shot AV-ASR, SynesLM achieved SOTA performance by lowering the Word Error Rate (WER) from 43.4% to 39.4% on the VisSpeech Dataset. Furthermore, our results in VST and VMT outperform the previous results, improving the BLEU score to 43.5 from 37.2 for VST, and to 54.8 from 54.4 for VMT. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2407.12538 [pdf, other]

High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

Authors: Juan Song, Jiaxiang He, Mingtao Feng, Keyan Wang, Yunsong Li, Ajmal Mian

Abstract: Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compressio… ▽ More Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compression via the wavelet transform, since high frequency components are crucial for reconstructing image details. We introduce a wavelet conditional diffusion model for high frequency prediction, followed by a residual codec that compresses and transmits prediction residuals to the decoder. This diffusion prediction-then-residual compression paradigm effectively addresses the low fidelity issue common in direct reconstructions by existing diffusion models. Considering the uncertainty from the random sampling of the diffusion model, we further design an uncertainty-weighted rate-distortion (R-D) loss tailored for residual compression, providing a more rational trade-off between rate and distortion. Comprehensive experiments on two benchmark datasets validate the effectiveness of UGDiff, surpassing state-of-the-art image compression methods in R-D performance, perceptual quality, subjective quality, and inference time. Our code is available at: https://github.com/hejiaxiang1/Wavelet-Diffusion/tree/main △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.07110 [pdf, other]

Foundation Models for Electrocardiograms

Authors: Junho Song, Jong-Hwan Jang, Byeong Tak Lee, DongGyun Hong, Joon-myoung Kwon, Yong-Yeon Jo

Abstract: Foundation models, enhanced by self-supervised learning (SSL) techniques, represent a cutting-edge frontier in biomedical signal analysis, particularly for electrocardiograms (ECGs), crucial for cardiac health monitoring and diagnosis. This study conducts a comprehensive analysis of foundation models for ECGs by employing and refining innovative SSL methodologies - namely, generative and contrasti… ▽ More Foundation models, enhanced by self-supervised learning (SSL) techniques, represent a cutting-edge frontier in biomedical signal analysis, particularly for electrocardiograms (ECGs), crucial for cardiac health monitoring and diagnosis. This study conducts a comprehensive analysis of foundation models for ECGs by employing and refining innovative SSL methodologies - namely, generative and contrastive learning - on a vast dataset of over 1.1 million ECG samples. By customizing these methods to align with the intricate characteristics of ECG signals, our research has successfully developed foundation models that significantly elevate the precision and reliability of cardiac diagnostics. These models are adept at representing the complex, subtle nuances of ECG data, thus markedly enhancing diagnostic capabilities. The results underscore the substantial potential of SSL-enhanced foundation models in clinical settings and pave the way for extensive future investigations into their scalable applications across a broader spectrum of medical diagnostics. This work sets a benchmark in the ECG field, demonstrating the profound impact of tailored, data-driven model training on the efficacy and accuracy of medical diagnostics. △ Less

Submitted 25 June, 2024; originally announced July 2024.

Comments: 27 pages

arXiv:2407.04561 [pdf, other]

Wireless Spectrum in Rural Farmlands: Status, Challenges and Opportunities

Authors: Mukaram Shahid, Kunal Das, Taimoor Ul Islam, Christ Somiah, Daji Qiao, Arsalan Ahmad, Jimming Song, Zhengyuan Zhu, Sarath Babu, Yong Guan, Tusher Chakraborty, Suraj Jog, Ranveer Chandra, Hongwei Zhang

Abstract: Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to hi… ▽ More Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to high-speed Internet in under-served areas without additional cost to expensive licensed spectrum. However, the current methods to utilize these white spaces are inefficient due to very conservative models and spectrum policies, causing under-utilization of valuable spectrum resources. This hampers the full potential of innovative wireless technologies that could benefit farmers, small Internet Service Providers (ISPs) or Mobile Network Operators (MNOs) operating in rural regions. This study explores the challenges faced by farmers and service providers when using shared spectrum bands to deploy their networks while ensuring maximum system performance and minimizing interference with other users. Additionally, we discuss how spatiotemporal spectrum models, in conjunction with database-driven spectrum-sharing solutions, can enhance the allocation and management of spectrum resources, ultimately improving the efficiency and reliability of wireless networks operating in shared spectrum bands. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2406.03912 [pdf, other]

GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

Authors: Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma

Abstract: Although deep reinforcement learning has demonstrated impressive achievements in controlling various autonomous systems, e.g., autonomous vehicles or humanoid robots, its inherent reliance on random exploration raises safety concerns in their real-world applications. To improve system safety during the learning process, a variety of Safe Reinforcement Learning (SRL) algorithms have been proposed,… ▽ More Although deep reinforcement learning has demonstrated impressive achievements in controlling various autonomous systems, e.g., autonomous vehicles or humanoid robots, its inherent reliance on random exploration raises safety concerns in their real-world applications. To improve system safety during the learning process, a variety of Safe Reinforcement Learning (SRL) algorithms have been proposed, which usually incorporate safety constraints within the Constrained Markov Decision Process (CMDP) framework. However, the efficacy of these SRL algorithms often relies on accurate function approximations, a task that is notably challenging to accomplish in the early learning stages due to data insufficiency. To address this problem, we introduce a Genralizable Safety enhancer (GenSafe) in this work. Leveraging model order reduction techniques, we first construct a Reduced Order Markov Decision Process (ROMDP) as a low-dimensional proxy for the original cost function in CMDP. Then, by solving ROMDP-based constraints that are reformulated from the original cost constraints, the proposed GenSafe refines the actions taken by the agent to enhance the possibility of constraint satisfaction. Essentially, GenSafe acts as an additional safety layer for SRL algorithms, offering broad compatibility across diverse SRL approaches. The performance of GenSafe is examined on multiple SRL benchmark problems. The results show that, it is not only able to improve the safety performance, especially in the early learning phases, but also to maintain the task performance at a satisfactory level. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.19665 [pdf]

A novel fault localization with data refinement for hydroelectric units

Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learning (SG-WMBDL) based fault localization method for hydroelectric units is proposed. To overcome the data scarcity, a SAE is embedded into the GAN to generate more high-quality samples in the data generation module. Considering the signals involving non-linear and non-smooth characteristics, the improved WNR which combining both soft and hard thresholding and local linear embedding (LLE) are utilized to the data preprocessing module in order to reduce the noise and effectively capture the local features. In addition, to seek higher performance, the novel Adaptive Boost (AdaBoost) combined with multi deep learning is proposed to achieve accurate fault localization. The experimental results show that the SG-WMBDL can locate faults for hydroelectric units under a small number of fault samples with non-linear and non-smooth characteristics on higher precision and accuracy compared to other frontier methods, which verifies the effectiveness and practicality of the proposed method. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

arXiv:2405.18844 [pdf, other]

Optical IRS for Visible Light Communication: Modeling, Design, and Open Issues

Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

Abstract: Optical intelligent reflecting surface (OIRS) offers a new and effective approach to resolving the line-of-sight blockage issue in visible light communication (VLC) by enabling redirection of light to bypass obstacles, thereby dramatically enhancing indoor VLC coverage and reliability. This article provides a comprehensive overview of OIRS for VLC, including channel modeling, design techniques, an… ▽ More Optical intelligent reflecting surface (OIRS) offers a new and effective approach to resolving the line-of-sight blockage issue in visible light communication (VLC) by enabling redirection of light to bypass obstacles, thereby dramatically enhancing indoor VLC coverage and reliability. This article provides a comprehensive overview of OIRS for VLC, including channel modeling, design techniques, and open issues. First, we present the characteristics of OIRS-reflected channels and introduce two practical models, namely, optics model and association model, which are then compared in terms of applicable conditions, configuration methods, and channel parameters. Next, under the more practically appealing association model, we discuss the main design techniques for OIRS-aided VLC systems, including beam alignment, channel estimation, and OIRS reflection optimization. Finally, open issues are identified to stimulate future research in this area. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.16248 [pdf]

Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

Authors: Junlin Song, Yuzhuo Chen, Yuan Yao, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

Abstract: Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully u… ▽ More Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully utilized. To address this gap, we develop a computer-aided diagnostic model focusing on white matter regions in brain MRI by employing radiomics and machine learning methods. This study introduced a MultiUNet model for segmenting white matter, leveraging the UNet architecture and utilizing manually segmented MRI images as the training data. Subsequently, we extracted white matter features using the Pyradiomics toolkit and applied different machine learning models such as Support Vector Machine, Random Forest, Logistic Regression, and K-Nearest Neighbors to predict autism. The prediction sets all exceeded 80% accuracy. Additionally, we employed Convolutional Neural Network to analyze segmented white matter images, achieving a prediction accuracy of 86.84%. Notably, Support Vector Machine demonstrated the highest prediction accuracy at 89.47%. These findings not only underscore the efficacy of the models but also establish a link between white matter abnormalities and autism. Our study contributes to a comprehensive evaluation of various diagnostic models for autism and introduces a computer-aided diagnostic algorithm for early and objective autism diagnosis based on MRI white matter regions. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.08306 [pdf, other]

Flight Path Optimization with Optimal Control Method

Authors: Gaofeng Su, Xi Cheng, Siyuan Feng, Ke Liu, Jilin Song, Jianan Chen, Chen Zhu, Hui Lin

Abstract: This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to d… ▽ More This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to define the dynamic model of the aircraft in accordance with the controllable inputs and wind disturbances. Then we will identify a precise objective in terms of optimization and implement an optimization program to solve it under the circumstances of simulated real flight situation. Finally, the optimization result is validated and discussed by different scenarios. △ Less

Submitted 13 August, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07260 [pdf]

A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition

Authors: Xiang Li, Jian Song, Zhigang Zhao, Chunxiao Wang, Dawei Song, Bin Hu

Abstract: This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining… ▽ More This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining self-supervised contrastive learning loss and supervised classification loss. This model optimizes both loss functions, capturing subtle EEG signal differences specific to emotion detection. Extensive experiments demonstrate SI-CLEER's robustness and superior accuracy on the SEED dataset compared to state-of-the-art methods. Furthermore, we analyze electrode performance, highlighting the significance of central frontal and temporal brain region EEGs in emotion detection. This study offers an universally applicable approach with potential benefits for diverse EEG classification tasks. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 5 pages, 3 figures, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2404.14778 [pdf, other]

Channel Estimation for Optical Intelligent Reflecting Surface-Assisted VLC System: A Joint Space-Time Sampling Approach

Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

Abstract: Optical intelligent reflecting surface (OIRS) has attracted increasing attention due to its capability of overcoming signal blockages in visible light communication (VLC), an emerging technology for the next-generation advanced transceivers. However, current works on OIRS predominantly assume known channel state information (CSI), which is essential to practical OIRS configuration. To bridge such… ▽ More Optical intelligent reflecting surface (OIRS) has attracted increasing attention due to its capability of overcoming signal blockages in visible light communication (VLC), an emerging technology for the next-generation advanced transceivers. However, current works on OIRS predominantly assume known channel state information (CSI), which is essential to practical OIRS configuration. To bridge such a gap, this paper proposes a new and customized channel estimation protocol for OIRSs under the alignment-based channel model. Specifically, we first unveil OIRS spatial and temporal coherence characteristics and derive the coherence distance and the coherence time in closed form. Next, to achieve fast beam alignment over different coherence time, we propose to dynamically tune the rotational angles of the OIRS reflecting elements following a geometric optics-based non-uniform codebook. Given the above beam alignment, we propose an efficient joint space-time sampling-based algorithm to estimate the OIRS channel. In particular, we divide the OIRS into multiple subarrays based on the coherence distance and sequentially estimate their associated CSI, followed by a spacetime interpolation to retrieve full CSI for other non-aligned transceiver antennas. Numerical results validate our theoretical analyses and demonstrate the efficacy of our proposed OIRS channel estimation scheme as compared to other benchmark schemes. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14706 [pdf, other]

Channel Estimation for Optical IRS-Assisted VLC System via Spatial Coherence

Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

Abstract: Optical intelligent reflecting surface (OIRS) has been considered a promising technology for visible light communication (VLC) by constructing visual line-of-sight propagation paths to address the signal blockage issue. However, the existing works on OIRSs are mostly based on perfect channel state information (CSI), whose acquisition appears to be challenging due to the passive nature of the OIRS.… ▽ More Optical intelligent reflecting surface (OIRS) has been considered a promising technology for visible light communication (VLC) by constructing visual line-of-sight propagation paths to address the signal blockage issue. However, the existing works on OIRSs are mostly based on perfect channel state information (CSI), whose acquisition appears to be challenging due to the passive nature of the OIRS. To tackle this challenge, this paper proposes a customized channel estimation algorithm for OIRSs. Specifically, we first unveil the OIRS spatial coherence characteristics and derive the coherence distance in closed form. Based on this property, a spatial sampling-based algorithm is proposed to estimate the OIRS-reflected channel, by dividing the OIRS into multiple subarrays based on the coherence distance and sequentially estimating their associated CSI, followed by an interpolation to retrieve the full CSI. Simulation results validate the derived OIRS spatial coherence and demonstrate the efficacy of the proposed OIRS channel estimation algorithm. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.03425 [pdf, other]

doi 10.1109/TGRS.2024.3417253

ChangeMamba: Remote Sensing Change Detection With Spatiotemporal State Space Model

Authors: Hongruixuan Chen, Jian Song, Chengxi Han, Junshi Xia, Naoto Yokoya

Abstract: Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings: CNN are constrained by a limited receptive field that may hinder their ability to capture broader spatial contexts, while Transformers are computationally intensive, making them costly to train and deploy on… ▽ More Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings: CNN are constrained by a limited receptive field that may hinder their ability to capture broader spatial contexts, while Transformers are computationally intensive, making them costly to train and deploy on large datasets. Recently, the Mamba architecture, based on state space models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the Mamba architecture for remote sensing CD tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary change detection (BCD), semantic change detection (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge Visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex training strategies or tricks, fully demonstrating the potential of the Mamba architecture in CD tasks. Further experiments show that our architecture is quite robust to degraded data. The source code will be available in https://github.com/ChenHongruixuan/MambaCD △ Less

Submitted 26 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE TGRS: https://ieeexplore.ieee.org/document/10565926

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-20, 2024, Art no. 4409720

arXiv:2403.17392 [pdf, other]

Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrain

Authors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki Ogura

Abstract: Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable roboti… ▽ More Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable robotic-like programmable control, and proposing a novel control algorithm for swarming. Although these creatures, called cyborg insects, have the ability to instinctively avoid collisions with neighbors and obstacles while adapting to complex terrains, there is a lack of literature on the control of multi-cyborg systems. This research gap is due to the difficulty in coordinating the movements of a cyborg system under the presence of insects' inherent individual variability in their reactions to control input. In response to this issue, we propose a novel swarm navigation algorithm addressing these challenges. The effectiveness of the algorithm is demonstrated through an experimental validation in which a cyborg swarm was successfully navigated through an unknown sandy field with obstacles and hills. This research contributes to the domain of swarm robotics and showcases the potential of integrating biological organisms with robotics and control theory to create more intelligent autonomous systems with real-world applications. △ Less

Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.10040 [pdf, other]

Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images

Authors: Zhikang Wang, Yumeng Zhang, Yingxue Xu, Seiya Imoto, Hao Chen, Jiangning Song

Abstract: Histo-genomic multi-modal methods have recently emerged as a powerful paradigm, demonstrating significant potential for improving cancer prognosis. However, genome sequencing, unlike histopathology imaging, is still not widely accessible in underdeveloped regions, limiting the application of these multi-modal approaches in clinical settings. To address this, we propose a novel Genome-informed Hype… ▽ More Histo-genomic multi-modal methods have recently emerged as a powerful paradigm, demonstrating significant potential for improving cancer prognosis. However, genome sequencing, unlike histopathology imaging, is still not widely accessible in underdeveloped regions, limiting the application of these multi-modal approaches in clinical settings. To address this, we propose a novel Genome-informed Hyper-Attention Network, termed G-HANet, which is capable of effectively distilling the histo-genomic knowledge during training to elevate uni-modal whole slide image (WSI)-based inference for the first time. Compared with traditional knowledge distillation methods (i.e., teacher-student architecture) in other tasks, our end-to-end model is superior in terms of training efficiency and learning cross-modal interactions. Specifically, the network comprises the cross-modal associating branch (CAB) and hyper-attention survival branch (HSB). Through the genomic data reconstruction from WSIs, CAB effectively distills the associations between functional genotypes and morphological phenotypes and offers insights into the gene expression profiles in the feature space. Subsequently, HSB leverages the distilled histo-genomic associations as well as the generated morphology-based weights to achieve the hyper-attention modeling of the patients from both histopathology and genomic perspectives to improve cancer prognosis. Extensive experiments are conducted on five TCGA benchmarking datasets and the results demonstrate that G-HANet significantly outperforms the state-of-the-art WSI-based methods and achieves competitive performance with genome-based and multi-modal methods. G-HANet is expected to be explored as a useful tool by the research community to address the current bottleneck of insufficient histo-genomic data pairing in the context of cancer prognosis and precision oncology. △ Less

Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.05136 [pdf, other]

DeRO: Dead Reckoning Based on Radar Odometry With Accelerometers Aided for Robot Localization

Authors: Hoang Viet Do, Yong Hun Kim, Joo Han Lee, Min Ho Lee, Jin Woo Song

Abstract: In this paper, we propose a radar odometry structure that directly utilizes radar velocity measurements for dead reckoning while maintaining its ability to update estimations within the Kalman filter framework. Specifically, we employ the Doppler velocity obtained by a 4D Frequency Modulated Continuous Wave (FMCW) radar in conjunction with gyroscope data to calculate poses. This approach helps mit… ▽ More In this paper, we propose a radar odometry structure that directly utilizes radar velocity measurements for dead reckoning while maintaining its ability to update estimations within the Kalman filter framework. Specifically, we employ the Doppler velocity obtained by a 4D Frequency Modulated Continuous Wave (FMCW) radar in conjunction with gyroscope data to calculate poses. This approach helps mitigate high drift resulting from accelerometer biases and double integration. Instead, tilt angles measured by gravitational force are utilized alongside relative distance measurements from radar scan matching for the filter's measurement update. Additionally, to further enhance the system's accuracy, we estimate and compensate for the radar velocity scale factor. The performance of the proposed method is verified through five real-world open-source datasets. The results demonstrate that our approach reduces position error by 47% and rotation error by 52% on average compared to the state-of-the-art radar-inertial fusion method in terms of absolute trajectory error. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 9 pages, 5 figures, 1 table, conference

ACM Class: I.2.9

arXiv:2401.09019 [pdf, other]

Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)

Authors: Hongruixuan Chen, Jian Song, Naoto Yokoya

Abstract: Unsupervised multimodal change detection is pivotal for time-sensitive tasks and comprehensive multi-temporal Earth monitoring. In this study, we explore unsupervised multimodal change detection between two key remote sensing data sources: optical high-resolution imagery and OpenStreetMap (OSM) data. Specifically, we propose to utilize the vision foundation model Segmentation Anything Model (SAM),… ▽ More Unsupervised multimodal change detection is pivotal for time-sensitive tasks and comprehensive multi-temporal Earth monitoring. In this study, we explore unsupervised multimodal change detection between two key remote sensing data sources: optical high-resolution imagery and OpenStreetMap (OSM) data. Specifically, we propose to utilize the vision foundation model Segmentation Anything Model (SAM), for addressing our task. Leveraging SAM's exceptional zero-shot transfer capability, high-quality segmentation maps of optical images can be obtained. Thus, we can directly compare these two heterogeneous data forms in the so-called segmentation domain. We then introduce two strategies for guiding SAM's segmentation process: the 'no-prompt' and 'box/mask prompt' methods. The two strategies are designed to detect land-cover changes in general scenarios and to identify new land-cover objects within existing backgrounds, respectively. Experimental results on three datasets indicate that the proposed approach can achieve more competitive results compared to representative unsupervised multimodal change detection methods. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.02771 [pdf, other]

Powerformer: A Section-adaptive Transformer for Power Flow Adjustment

Authors: Kaixuan Chen, Wei Luo, Shunyu Liu, Yaoquan Wei, Yihe Zhou, Yunpeng Qing, Quan Zhang, Jie Song, Mingli Song

Abstract: In this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in… ▽ More In this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in conventional transformers. This mechanism effectively integrates power system states with transmission section information, which facilitates the development of robust state representations. Furthermore, by considering the graph topology of power system and the electrical attributes of bus nodes, we introduce two customized strategies to further enhance the expressiveness: graph neural network propagation and multi-factor attention mechanism. Extensive evaluations are conducted on three power system scenarios, including the IEEE 118-bus system, a realistic 300-bus system in China, and a large-scale European system with 9241 buses, where Powerformer demonstrates its superior performance over several baseline methods. △ Less

Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 8 figures

arXiv:2312.16003 [pdf, other]

Blind Frequency-Domain Equalization Using Vector-Quantized Variational Autoencoders

Authors: Jinxiang Song, Vincent Lauinger, Christian Häger, Jochen Schröder, Alexandre Graell i Amat, Laurent Schmalen, Henk Wymeersch

Abstract: We propose a novel frequency-domain blind equalization scheme for coherent optical communications. The method is shown to achieve similar performance to its recently proposed time-domain counterpart with lower computational complexity, while outperforming the commonly used CMA-based equalizers. We propose a novel frequency-domain blind equalization scheme for coherent optical communications. The method is shown to achieve similar performance to its recently proposed time-domain counterpart with lower computational complexity, while outperforming the commonly used CMA-based equalizers. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.13654 [pdf, other]

Free Space Optical Integrated Sensing and Communication Based on DCO-OFDM: Performance Metrics and Resource Allocation

Authors: Yunfeng Wen, Fang Yang, Jian Song, Zhu Han

Abstract: As one of the six usage scenarios of the sixth generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered considerable attention, and numerous studies have been conducted on radio-frequency (RF)-ISAC. Benefitting from the communication and sensing capabilities of an optical system, free space optical (FSO)-ISAC becomes a potential complement to RF-ISAC. I… ▽ More As one of the six usage scenarios of the sixth generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered considerable attention, and numerous studies have been conducted on radio-frequency (RF)-ISAC. Benefitting from the communication and sensing capabilities of an optical system, free space optical (FSO)-ISAC becomes a potential complement to RF-ISAC. In this paper, a direct-current-biased optical orthogonal frequency division multiplexing (DCO-OFDM) scheme is proposed for FSO-ISAC. To derive the spectral efficiency for communication and the Fisher information for sensing as performance metrics, we model the clipping noise of DCO-OFDM as additive colored Gaussian noise to obtain the expression of the signal-to-noise ratio. Based on the derived performance metrics, joint power allocation problems are formulated for both communication-centric and sensing-centric scenarios. In addition, the non-convex joint optimization problems are decomposed into sub-problems for DC bias and subcarriers, which can be solved by block coordinate descent algorithms. Furthermore, numerical simulations demonstrate the proposed algorithms and reveal the trade-off between communication and sensing functionalities of the OFDM-based FSO-ISAC system. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 13 pages, 8 figures

arXiv:2312.13640 [pdf, other]

Optical Integrated Sensing and Communication: Architectures, Potentials and Challenges

Authors: Yunfeng Wen, Fang Yang, Jian Song, Zhu Han

Abstract: Integrated sensing and communication (ISAC) is viewed as a crucial component of future mobile networks and has gained much interest in both academia and industry. Similar to the emergence of radio-frequency (RF) ISAC, the integration of free space optical communication and optical sensing yields optical ISAC (O-ISAC), which is regarded as a powerful complement to its RF counterpart. In this articl… ▽ More Integrated sensing and communication (ISAC) is viewed as a crucial component of future mobile networks and has gained much interest in both academia and industry. Similar to the emergence of radio-frequency (RF) ISAC, the integration of free space optical communication and optical sensing yields optical ISAC (O-ISAC), which is regarded as a powerful complement to its RF counterpart. In this article, we first introduce the generalized system structure of O-ISAC, and then elaborate on three advantages of O-ISAC, i.e., increasing communication rate, enhancing sensing precision, and reducing interference. Next, waveform design and resource allocation of O-ISAC are discussed based on pulsed waveform, constant-modulus waveform, and multi-carrier waveform. Furthermore, we put forward future trends and challenges of O-ISAC, which are expected to provide some valuable directions for future research. △ Less

Submitted 10 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 7 pages, 5 figures

arXiv:2311.09839 [pdf, other]

Load Data Valuation in Multi-Energy Systems: An End-to-End Approach

Authors: Yangze Zhou, Qingsong Wen, Jie Song, Xueyuan Cui, Yi Wang

Abstract: Accurate load forecasting serves as the foundation for the flexible operation of multi-energy systems (MES). Multi-energy loads are tightly coupled and exhibit significant uncertainties. Many works focus on enhancing forecasting accuracy by leveraging cross-sector information. However, data owners may not be motivated to share their data unless it leads to substantial benefits. Ensuring a reasonab… ▽ More Accurate load forecasting serves as the foundation for the flexible operation of multi-energy systems (MES). Multi-energy loads are tightly coupled and exhibit significant uncertainties. Many works focus on enhancing forecasting accuracy by leveraging cross-sector information. However, data owners may not be motivated to share their data unless it leads to substantial benefits. Ensuring a reasonable data valuation can encourage them to share their data willingly. This paper presents an end-to-end framework to quantify multi-energy load data value by integrating forecasting and decision processes. To address optimization problems with integer variables, a two-stage end-to-end model solution is proposed. Moreover, a profit allocation strategy based on contribution to cost savings is investigated to encourage data sharing in MES. The experimental results demonstrate a significant decrease in operation costs, suggesting that the proposed valuation approach more effectively extracts the inherent data value than traditional methods. According to the proposed incentive mechanism, all sectors can benefit from data sharing by improving forecasting accuracy or receiving economic compensation. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 10 pages

arXiv:2311.00372 [pdf, ps, other]

Zeroth-Order Feedback-Based Optimization for Distributed Demand Response

Authors: Ruiyang Jin, Yujie Tang, Jie Song

Abstract: Distributed demand response is a typical distributed optimization problem that requires coordination among multiple agents to satisfy demand response requirements. However, existing distributed algorithms for this problem still face challenges such as unknown system models, nonconvexity, privacy issues, etc. To address these challenges, we propose and analyze two distributed algorithms, in which t… ▽ More Distributed demand response is a typical distributed optimization problem that requires coordination among multiple agents to satisfy demand response requirements. However, existing distributed algorithms for this problem still face challenges such as unknown system models, nonconvexity, privacy issues, etc. To address these challenges, we propose and analyze two distributed algorithms, in which the agents do not share their information and instead perform local updates using zeroth-order feedback information to estimate the gradient of the global objective function. One algorithm applies to problems with general convex and compact feasible sets but has higher oracle complexity bounded by $O(d/ε^2)$, while the other algorithm achieves lower complexity bound $O(d/ε)$ but is only applicable to problems with box constraints. We conduct empirical experiments to validate their performance. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.20117 [pdf, other]

Refined Equivalent Pinhole Model for Large-scale 3D Reconstruction from Spaceborne CCD Imagery

Authors: Hong Danyang, Yu Anzhu, Ji Song, Cao Xuefeng, Quan Yujun, Guo Wenyue, Qiu Chunping

Abstract: In this study, we present a large-scale earth surface reconstruction pipeline for linear-array charge-coupled device (CCD) satellite imagery. While mainstream satellite image-based reconstruction approaches perform exceptionally well, the rational functional model (RFM) is subject to several limitations. For example, the RFM has no rigorous physical interpretation and differs significantly from th… ▽ More In this study, we present a large-scale earth surface reconstruction pipeline for linear-array charge-coupled device (CCD) satellite imagery. While mainstream satellite image-based reconstruction approaches perform exceptionally well, the rational functional model (RFM) is subject to several limitations. For example, the RFM has no rigorous physical interpretation and differs significantly from the pinhole imaging model; hence, it cannot be directly applied to learning-based 3D reconstruction networks and to more novel reconstruction pipelines in computer vision. Hence, in this study, we introduce a method in which the RFM is equivalent to the pinhole camera model (PCM), meaning that the internal and external parameters of the pinhole camera are used instead of the rational polynomial coefficient parameters. We then derive an error formula for this equivalent pinhole model for the first time, demonstrating the influence of the image size on the accuracy of the reconstruction. In addition, we propose a polynomial image refinement model that minimizes equivalent errors via the least squares method. The experiments were conducted using four image datasets: WHU-TLC, DFC2019, ISPRS-ZY3, and GF7. The results demonstrated that the reconstruction accuracy was proportional to the image size. Our polynomial image refinement model significantly enhanced the accuracy and completeness of the reconstruction, and achieved more significant improvements for larger-scale images. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 24 pages

arXiv:2310.17505 [pdf, other]

Free Space Optical Communication for Inter-Satellite Link: Architecture, Potentials and Trends

Authors: Guanhua Wang, Fang Yang, Jian Song, Zhu Han

Abstract: The sixth-generation (6G) network is expected to achieve global coverage based on the space-air-ground integrated network, and the latest satellite network will play an important role in it. The introduction of inter-satellite links (ISLs) can significantly improve the throughput of the satellite network, and recently gets lots of attention from both academia and industry. In this paper, we illust… ▽ More The sixth-generation (6G) network is expected to achieve global coverage based on the space-air-ground integrated network, and the latest satellite network will play an important role in it. The introduction of inter-satellite links (ISLs) can significantly improve the throughput of the satellite network, and recently gets lots of attention from both academia and industry. In this paper, we illustrate the advantages of using the laser for ISLs due to its longer communication distance, higher data speed, and stronger security. Specifically, space-borne laser terminals with the acquisition, pointing and tracking mechanism which realize long-distance communication are illustrated, advanced modulation and multiplexing modes that make high communication rates possible are introduced, and the security of ISLs ensured by the characteristics of both laser and the optical channel is also analyzed. Moreover, some open issues such as advanced optical beam steering, routing and scheduling algorithm, and integrated sensing and communication are discussed to direct future research. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.12551 [pdf, other]

Iterative PnP and its application in 3D-2D vascular image registration for robot navigation

Authors: Jingwei Song, Keke Yang, Zheng Zhang, Meng Li, Tuoyu Cao, Maani Ghaffari

Abstract: This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications… ▽ More This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications. We categorize centerline-based vascular 3D-2D image registration problems as an iterative Perspective-n-Point (PnP) problem and propose to use the Levenberg-Marquardt solver on the Lie manifold. Then, the recently developed Reproducing Kernel Hilbert Space (RKHS) algorithm is introduced to overcome the ``big-to-small'' problem in typical robotic scenarios. Finally, an iterative reweighted least squares is applied to solve RKHS-based formulation efficiently. Experiments indicate that the proposed algorithm processes registration over 50 Hz (rigid) and 20 Hz (nonrigid) and obtains competing registration accuracy similar to other works. Results indicate that our Iterative PnP is suitable for future vascular intervention robot applications. △ Less

Submitted 11 January, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: Submitted to ICRA 2024 Errors in Eq. 4 and Eq. 6 have been corrected. Updates include some minor improvements in Section II

arXiv:2310.07200 [pdf, other]

Input-Output Relation and Low-Complexity Receiver Design for CP-OTFS Systems with Doppler Squint

Authors: Xuehan Wang, Xu Shi, Jintao Wang, Jian Song

Abstract: In orthogonal time frequency space (OTFS) systems, the impact of frequency-dependent Doppler which is referred to as the Doppler squint effect (DSE) is accumulated through longer duration, whose negligence has prevented OTFS systems from exploiting the performance superiority. In this paper, practical OFDM system using cyclic prefix time guard interval (CP-OFDM)-based OTFS systems with DSE are ado… ▽ More In orthogonal time frequency space (OTFS) systems, the impact of frequency-dependent Doppler which is referred to as the Doppler squint effect (DSE) is accumulated through longer duration, whose negligence has prevented OTFS systems from exploiting the performance superiority. In this paper, practical OFDM system using cyclic prefix time guard interval (CP-OFDM)-based OTFS systems with DSE are adopted. Cyclic prefix (CP) length is analyzed while the input-output relation considering DSE is derived. By deploying two prefix OFDM symbols, the channel estimation can be easily divided into three parts as delay detection, Doppler extraction and gain estimation. The linear equalization scheme is adopted taking the block diagonal property of the channel matrix into account, which completes the low-complexity receiver design. Simulation results confirm the significance of DSE and the considerable performance of the proposed low-complexity receiver scheme considering DSE. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: This article has been accepted by the 2023 IEEE Global Communication Conference workshops (GC WKshps)

arXiv:2310.01799 [pdf, other]

SMRD: SURE-based Robust MRI Reconstruction with Diffusion Models

Authors: Batu Ozturkler, Chao Liu, Benjamin Eckart, Morteza Mardani, Jiaming Song, Jan Kautz

Abstract: Diffusion models have recently gained popularity for accelerated MRI reconstruction due to their high sample quality. They can effectively serve as rich data priors while incorporating the forward model flexibly at inference time, and they have been shown to be more robust than unrolled methods under distribution shifts. However, diffusion models require careful tuning of inference hyperparameters… ▽ More Diffusion models have recently gained popularity for accelerated MRI reconstruction due to their high sample quality. They can effectively serve as rich data priors while incorporating the forward model flexibly at inference time, and they have been shown to be more robust than unrolled methods under distribution shifts. However, diffusion models require careful tuning of inference hyperparameters on a validation set and are still sensitive to distribution shifts during testing. To address these challenges, we introduce SURE-based MRI Reconstruction with Diffusion models (SMRD), a method that performs test-time hyperparameter tuning to enhance robustness during testing. SMRD uses Stein's Unbiased Risk Estimator (SURE) to estimate the mean squared error of the reconstruction during testing. SURE is then used to automatically tune the inference hyperparameters and to set an early stopping criterion without the need for validation tuning. To the best of our knowledge, SMRD is the first to incorporate SURE into the sampling stage of diffusion models for automatic hyperparameter selection. SMRD outperforms diffusion model baselines on various measurement noise levels, acceleration factors, and anatomies, achieving a PSNR improvement of up to 6 dB under measurement noise. The code is publicly available at https://github.com/NVlabs/SMRD . △ Less

Submitted 18 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: MICCAI 2023

arXiv:2310.01453 [pdf, other]

Enhancing Secrecy Capacity in PLS Communication with NORAN based on Pilot Information Codebooks

Authors: Yebo Gu, Tao Shen, Jian Song, Qingbo Wang

Abstract: In recent research, non-orthogonal artificial noise (NORAN) has been proposed as an alternative to orthogonal artificial noise (AN). However, NORAN introduces additional noise into the channel, which reduces the capacity of the legitimate channel (LC). At the same time, selecting a NORAN design with ideal security performance from a large number of design options is also a challenging problem. To… ▽ More In recent research, non-orthogonal artificial noise (NORAN) has been proposed as an alternative to orthogonal artificial noise (AN). However, NORAN introduces additional noise into the channel, which reduces the capacity of the legitimate channel (LC). At the same time, selecting a NORAN design with ideal security performance from a large number of design options is also a challenging problem. To address these two issues, a novel NORAN based on a pilot information codebook is proposed in this letter. The codebook associates different suboptimal NORANs with pilot information as the key under different channel state information (CSI). The receiver interrogates the codebook using the pilot information to obtain the NORAN that the transmitter will transmit in the next moment, in order to eliminate the NORAN when receiving information. Therefore, NORAN based on pilot information codebooks can improve the secrecy capacity (SC) of the communication system by directly using suboptimal NORAN design schemes without increasing the noise in the LC. Numerical simulations and analyses show that the introduction of NORAN with a novel design using pilot information codebooks significantly enhances the security and improves the SC of the communication system. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2310.00413 [pdf, other]

SSIF: Learning Continuous Image Representation for Spatial-Spectral Super-Resolution

Authors: Gengchen Mai, Ni Lao, Weiwei Sun, Yuchi Ma, Jiaming Song, Chenlin Meng, Hongxu Ma, Jinmeng Rao, Ziyuan Li, Stefano Ermon

Abstract: Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolu… ▽ More Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolutions. To address this challenge, we propose Spatial-Spectral Implicit Function (SSIF), a neural implicit model that represents an image as a function of both continuous pixel coordinates in the spatial domain and continuous wavelengths in the spectral domain. We empirically demonstrate the effectiveness of SSIF on two challenging spatio-spectral super-resolution benchmarks. We observe that SSIF consistently outperforms state-of-the-art baselines even when the baselines are allowed to train separate models at each spectral resolution. We show that SSIF generalizes well to both unseen spatial resolutions and spectral resolutions. Moreover, SSIF can generate high-resolution images that improve the performance of downstream tasks (e.g., land use classification) by 1.7%-7%. △ Less

Submitted 30 September, 2023; originally announced October 2023.

MSC Class: 68T07; 68T45 ACM Class: I.4.10; I.2.10; I.4.6

arXiv:2309.07198 [pdf, other]

doi 10.1364/OL.515429

Temporal compressive edge imaging enabled by a lensless diffuser camera

Authors: Ze Zheng, Baolei Liu, Jiaqi Song, Lei Ding, Xiaolan Zhong, David Mcgloin, Fan Wang

Abstract: Lensless imagers based on diffusers or encoding masks enable high-dimensional imaging from a single shot measurement and have been applied in various applications. However, to further extract image information such as edge detection, conventional post-processing filtering operations are needed after the reconstruction of the original object images in the diffuser imaging systems. Here, we present… ▽ More Lensless imagers based on diffusers or encoding masks enable high-dimensional imaging from a single shot measurement and have been applied in various applications. However, to further extract image information such as edge detection, conventional post-processing filtering operations are needed after the reconstruction of the original object images in the diffuser imaging systems. Here, we present the concept of a temporal compressive edge detection method based on a lensless diffuser camera, which can directly recover a time sequence of edge images of a moving object from a single-shot measurement, without further post-processing steps. Our approach provides higher image quality during edge detection, compared with the conventional post-processing method. We demonstrate the effectiveness of this approach by both numerical simulation and experiments. The proof-of-concept approach can be further developed with other image post-process operations or versatile computer vision assignments toward task-oriented intelligent lensless imaging systems. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 5 pages, 4 figures

Journal ref: Optics Letters, 49(11), 3058-3061 (2024)

arXiv:2307.08950 [pdf, other]

Deep Physics-Guided Unrolling Generalization for Compressed Sensing

Authors: Bin Chen, Jiechong Song, Jingfen Xie, Jian Zhang

Abstract: By absorbing the merits of both the model- and data-driven methods, deep physics-engaged learning scheme achieves high-accuracy and interpretable image reconstruction. It has attracted growing attention and become the mainstream for inverse imaging tasks. Focusing on the image compressed sensing (CS) problem, we find the intrinsic defect of this emerging paradigm, widely implemented by deep algori… ▽ More By absorbing the merits of both the model- and data-driven methods, deep physics-engaged learning scheme achieves high-accuracy and interpretable image reconstruction. It has attracted growing attention and become the mainstream for inverse imaging tasks. Focusing on the image compressed sensing (CS) problem, we find the intrinsic defect of this emerging paradigm, widely implemented by deep algorithm-unrolled networks, in which more plain iterations involving real physics will bring enormous computation cost and long inference time, hindering their practical application. A novel deep $\textbf{P}$hysics-guided un$\textbf{R}$olled recovery $\textbf{L}$earning ($\textbf{PRL}$) framework is proposed by generalizing the traditional iterative recovery model from image domain (ID) to the high-dimensional feature domain (FD). A compact multiscale unrolling architecture is then developed to enhance the network capacity and keep real-time inference speeds. Taking two different perspectives of optimization and range-nullspace decomposition, instead of building an algorithm-specific unrolled network, we provide two implementations: $\textbf{PRL-PGD}$ and $\textbf{PRL-RND}$. Experiments exhibit the significant performance and efficiency leading of PRL networks over other state-of-the-art methods with a large potential for further improvement and real application to other inverse imaging problems or optimization models. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: Accepted by International Journal of Computer Vision (IJCV) 2023

arXiv:2306.16060 [pdf, other]

doi 10.1109/TIP.2023.3263100

Dynamic Path-Controllable Deep Unfolding Network for Compressive Sensing

Authors: Jiechong Song, Bin Chen, Jian Zhang

Abstract: Deep unfolding network (DUN) that unfolds the optimization algorithm into a deep neural network has achieved great success in compressive sensing (CS) due to its good interpretability and high performance. Each stage in DUN corresponds to one iteration in optimization. At the test time, all the sampling images generally need to be processed by all stages, which comes at a price of computation burd… ▽ More Deep unfolding network (DUN) that unfolds the optimization algorithm into a deep neural network has achieved great success in compressive sensing (CS) due to its good interpretability and high performance. Each stage in DUN corresponds to one iteration in optimization. At the test time, all the sampling images generally need to be processed by all stages, which comes at a price of computation burden and is also unnecessary for the images whose contents are easier to restore. In this paper, we focus on CS reconstruction and propose a novel Dynamic Path-Controllable Deep Unfolding Network (DPC-DUN). DPC-DUN with our designed path-controllable selector can dynamically select a rapid and appropriate route for each image and is slimmable by regulating different performance-complexity tradeoffs. Extensive experiments show that our DPC-DUN is highly flexible and can provide excellent performance and dynamic adjustment to get a suitable tradeoff, thus addressing the main requirements to become appealing in practice. Codes are available at https://github.com/songjiechong/DPC-DUN. △ Less

Submitted 19 February, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: TIP, 2023

arXiv:2306.14047 [pdf, other]

Towards Optimal Pricing of Demand Response -- A Nonparametric Constrained Policy Optimization Approach

Authors: Jun Song, Chaoyue Zhao

Abstract: Demand response (DR) has been demonstrated to be an effective method for reducing peak load and mitigating uncertainties on both the supply and demand sides of the electricity market. One critical question for DR research is how to appropriately adjust electricity prices in order to shift electrical load from peak to off-peak hours. In recent years, reinforcement learning (RL) has been used to add… ▽ More Demand response (DR) has been demonstrated to be an effective method for reducing peak load and mitigating uncertainties on both the supply and demand sides of the electricity market. One critical question for DR research is how to appropriately adjust electricity prices in order to shift electrical load from peak to off-peak hours. In recent years, reinforcement learning (RL) has been used to address the price-based DR problem because it is a model-free technique that does not necessitate the identification of models for end-use customers. However, the majority of RL methods cannot guarantee the stability and optimality of the learned pricing policy, which is undesirable in safety-critical power systems and may result in high customer bills. In this paper, we propose an innovative nonparametric constrained policy optimization approach that improves optimality while ensuring stability of the policy update, by removing the restrictive assumption on policy representation that the majority of the RL literature adopts: the policy must be parameterized or fall into a certain distribution class. We derive a closed-form expression of optimal policy update for each iteration and develop an efficient on-policy actor-critic algorithm to address the proposed constrained policy optimization problem. The experiments on two DR cases show the superior performance of our proposed nonparametric constrained policy optimization method compared with state-of-the-art RL algorithms. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 2023 IEEE PES General Meeting. arXiv admin note: text overlap with arXiv:2006.07815

arXiv:2304.13986 [pdf, other]

Optimization-Inspired Cross-Attention Transformer for Compressive Sensing

Authors: Jiechong Song, Chong Mou, Shiqi Wang, Siwei Ma, Jian Zhang

Abstract: By integrating certain optimization solvers with deep neural networks, deep unfolding network (DUN) with good interpretability and high performance has attracted growing attention in compressive sensing (CS). However, existing DUNs often improve the visual quality at the price of a large number of parameters and have the problem of feature information loss during iteration. In this paper, we propo… ▽ More By integrating certain optimization solvers with deep neural networks, deep unfolding network (DUN) with good interpretability and high performance has attracted growing attention in compressive sensing (CS). However, existing DUNs often improve the visual quality at the price of a large number of parameters and have the problem of feature information loss during iteration. In this paper, we propose an Optimization-inspired Cross-attention Transformer (OCT) module as an iterative process, leading to a lightweight OCT-based Unfolding Framework (OCTUF) for image CS. Specifically, we design a novel Dual Cross Attention (Dual-CA) sub-module, which consists of an Inertia-Supplied Cross Attention (ISCA) block and a Projection-Guided Cross Attention (PGCA) block. ISCA block introduces multi-channel inertia forces and increases the memory effect by a cross attention mechanism between adjacent iterations. And, PGCA block achieves an enhanced information interaction, which introduces the inertia force into the gradient descent step through a cross attention block. Extensive CS experiments manifest that our OCTUF achieves superior performance compared to state-of-the-art methods while training lower complexity. Codes are available at https://github.com/songjiechong/OCTUF. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: CVPR 2023

arXiv:2304.05617 [pdf, other]

AutoRepair: Automated Repair for AI-Enabled Cyber-Physical Systems under Safety-Critical Conditions

Authors: Deyun Lyu, Jiayang Song, Zhenya Zhang, Zhijie Wang, Tianyi Zhang, Lei Ma, Jianjun Zhao

Abstract: Cyber-Physical Systems (CPS) have been widely deployed in safety-critical domains such as transportation, power and energy. Recently, there comes an increasing demand in employing deep neural networks (DNNs) in CPS for more intelligent control and decision making in sophisticated industrial safety-critical conditions, giving birth to the class of DNN controllers. However, due to the inherent uncer… ▽ More Cyber-Physical Systems (CPS) have been widely deployed in safety-critical domains such as transportation, power and energy. Recently, there comes an increasing demand in employing deep neural networks (DNNs) in CPS for more intelligent control and decision making in sophisticated industrial safety-critical conditions, giving birth to the class of DNN controllers. However, due to the inherent uncertainty and opaqueness of DNNs, concerns about the safety of DNN-enabled CPS are also surging. In this work, we propose an automated framework named AutoRepair that, given a safety requirement, identifies unsafe control behavior in a DNN controller and repairs them through an optimization-based method. Having an unsafe signal of system execution, AutoRepair iteratively explores the control decision space and searches for the optimal corrections for the DNN controller in order to satisfy the safety requirements. We conduct a comprehensive evaluation of AutoRepair on 6 instances of industry-level DNN-enabled CPS from different safety-critical domains. Evaluation results show that AutoRepair successfully repairs critical safety issues in the DNN controllers, and significantly improves the reliability of CPS. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2303.02259 [pdf, other]

Graph-based Simultaneous Coverage and Exploration Planning for Fast Multi-robot Search

Authors: Indraneel Patil, Rachel Zheng, Charvi Gupta, Jaekyung Song, Narendar Sriram, Katia Sycara

Abstract: In large unknown environments, search operations can be much more time-efficient with the use of multi-robot fleets by parallelizing efforts. This means robots must efficiently perform collaborative mapping (exploration) while simultaneously searching an area for victims (coverage). Previous simultaneous mapping and planning techniques treat these problems as separate and do not take advantage of… ▽ More In large unknown environments, search operations can be much more time-efficient with the use of multi-robot fleets by parallelizing efforts. This means robots must efficiently perform collaborative mapping (exploration) while simultaneously searching an area for victims (coverage). Previous simultaneous mapping and planning techniques treat these problems as separate and do not take advantage of the possibility for a unified approach. We propose a novel exploration-coverage planner which bridges the mapping and search domains by growing sets of random trees rooted upon a pose graph produced through mapping to generate points of interest, or tasks. Furthermore, it is important for the robots to first prioritize high information tasks to locate the greatest number of victims in minimum time by balancing coverage and exploration, which current methods do not address. Towards this goal, we also present a new multi-robot task allocator that formulates a notion of a hierarchical information heuristic for time-critical collaborative search. Our results show that our algorithm produces 20% more coverage efficiency, defined as average covered area per second, compared to the existing state-of-the-art. Our algorithms and the rest of our multi-robot search stack is based in ROS and made open source △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: Submitted to IROS 2023 on 1st March

arXiv:2302.14291 [pdf]

Battery Valuation and Management for Battery Swapping Station with an Intertemporal Framework

Authors: Xinjiang Chen, Yu Yang, Jianxiao Wang, Jie Song, Guannan He

Abstract: Battery swapping as a business model for battery energy storage (BES) has great potential in future integrated low-carbon energy and transportation systems. However, frequent battery swapping will inevitably accelerate battery degradation and shorten the battery life accordingly. To model the tradeoff of BES use between energy and transportation applications coupled by battery swapping, we develop… ▽ More Battery swapping as a business model for battery energy storage (BES) has great potential in future integrated low-carbon energy and transportation systems. However, frequent battery swapping will inevitably accelerate battery degradation and shorten the battery life accordingly. To model the tradeoff of BES use between energy and transportation applications coupled by battery swapping, we develop a life-cycle decision model that coordinates battery charging and swapping. This model is derived based on an improved intertemporal decision framework, in which the optimal marginal degradation cost (MDC) of BES is determined to maximize the BES benefit across time and application. The proposed framework and model are applied to manage a battery swapping station that simultaneously provides battery swapping services to electric vehicle customers and provides flexibility service to the power grid, including energy arbitrage and reserve. The case study shows that while the end of the physical life of BES occurs faster with battery swapping, the economic life becomes considerably longer. The results also reveal that the optimal MDC depends on the battery values in each application, and we analyze how the battery swapping price affects the optimal MDC and battery life. The proposed framework and model can also provide decision support for on-demand BES service, such as battery trading, renting and secondary use. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.11687 [pdf, other]

Blind Channel Equalization Using Vector-Quantized Variational Autoencoders

Authors: Jinxiang Song, Vincent Lauinger, Yibo Wu, Christian Häger, Jochen Schröder, Alexandre Graell i Amat, Laurent Schmalen, Henk Wymeersch

Abstract: State-of-the-art high-spectral-efficiency communication systems employ high-order modulation formats coupled with high symbol rates to accommodate the ever-growing demand for data rate-hungry applications. However, such systems are more vulnerable to linear and nonlinear transmission impairments, and it is important to mitigate the performance loss via digital signal processing. In this paper, we… ▽ More State-of-the-art high-spectral-efficiency communication systems employ high-order modulation formats coupled with high symbol rates to accommodate the ever-growing demand for data rate-hungry applications. However, such systems are more vulnerable to linear and nonlinear transmission impairments, and it is important to mitigate the performance loss via digital signal processing. In this paper, we propose a novel machine learning approach for blind channel equalization and estimation using the vector quantized (VQ) \ac{VAE} framework. The proposed approach generalizes the applicability of the conventional \ac{VAE}-based equalizer to nonlinear systems employing high-order modulation formats by introducing a codebook component and an associated novel loss function. We evaluate the performance of the proposed method over a linear additive white Gaussian noise channel with intersymbol interference and two nonlinear scenarios. Simulation results show that the proposed method can achieve similar performance as a data aided equalizer using the \acf{MMSE} criterion, and outperforms the blind\ac{CMA} and the \ac{VAE}-based channel equalizer. Furthermore, we show that for the linear channel, the proposed scheme exhibits better convergence properties than the \ac{MMSE}-based, the \ac{CMA}-based, and the \ac{VAE}-based equalizers in terms of both convergence speed and robustness to variations in training batch size and learning rate. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: Submitted to Transactions on Communications

arXiv:2302.07269 [pdf, other]

doi 10.1364/OE.486290

Dual-mode adaptive-SVD ghost imaging

Authors: Dajing Wang, Baolei Liu, Jiaqi Song, Yao Wang, Xuchen Shan, Fan Wang

Abstract: In this paper, we present a dual-mode adaptive singular value decomposition ghost imaging (A-SVD GI), which can be easily switched between the modes of imaging and edge detection. It can adaptively localize the foreground pixels via a threshold selection method. Then only the foreground region is illuminated by the singular value decomposition (SVD) - based patterns, consequently retrieving high-q… ▽ More In this paper, we present a dual-mode adaptive singular value decomposition ghost imaging (A-SVD GI), which can be easily switched between the modes of imaging and edge detection. It can adaptively localize the foreground pixels via a threshold selection method. Then only the foreground region is illuminated by the singular value decomposition (SVD) - based patterns, consequently retrieving high-quality images with fewer sampling ratios. By changing the selecting range of foreground pixels, the A-SVD GI can be switched to the mode of edge detection to directly reveal the edge of objects, without needing the original image. We investigate the performance of these two modes through both numerical simulations and experiments. We also develop a single-round scheme to halve measurement numbers in experiments, instead of separately illuminating positive and negative patterns in traditional methods. The binarized SVD patterns, generated by the spatial dithering method, are modulated by a digital micromirror device (DMD) to speed up the data acquisition. This dual-mode A-SVD GI can be applied in various applications, such as remote sensing or target recognition, and could be further extended for multi-modality functional imaging/detection. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2302.06156 [pdf, other]

On the Doppler Squint Effect in OTFS Systems over Doubly-Dispersive Channels: Modeling and Evaluation

Authors: Xuehan Wang, Xu Shi, Jintao Wang, Jian Song

Abstract: Extensive work has demonstrated the excellent performance of orthogonal time frequency space (OTFS) modulation in high-mobility scenarios. Time-variant wideband channel estimation serves as one of the key compositions of OTFS receivers since the data detection requires accurate channel state information (CSI). In practical wideband OTFS systems, the Doppler shift brought by the high mobility is fr… ▽ More Extensive work has demonstrated the excellent performance of orthogonal time frequency space (OTFS) modulation in high-mobility scenarios. Time-variant wideband channel estimation serves as one of the key compositions of OTFS receivers since the data detection requires accurate channel state information (CSI). In practical wideband OTFS systems, the Doppler shift brought by the high mobility is frequency-dependent, which is referred to as the Doppler Squint Effect (DSE). Unfortunately, DSE was ignored in overall prior estimation schemes employed in OTFS systems, which leads to severe performance loss in channel estimation and the consequent data detection. In this paper, we investigate DSE of wideband time-variant channel in delay-Doppler domain and concentrate on the characterization of OTFS channel coefficients considering DSE. The formulation and evaluation of OTFS input-output relationship are provided for both ideal and rectangular waveforms considering DSE. The channel estimation is therefore formulated as a sparse signal recovery problem and an orthogonal matching pursuit (OMP)-based scheme is adopted to solve it. Simulation results confirm the significance of DSE and the performance superiority compared with traditional channel estimation approaches ignoring DSE. △ Less

Submitted 13 February, 2023; originally announced February 2023.

arXiv:2301.02378 [pdf, other]

doi 10.1016/j.ymssp.2023.110668

Deep learning for full-field ultrasonic characterization

Authors: Yang Xu, Fatemeh Pourahmadian, Jian Song, Conglin Wang

Abstract: This study takes advantage of recent advances in machine learning to establish a physics-based data analytic platform for distributed reconstruction of mechanical properties in layered components from full waveform data. In this vein, two logics, namely the direct inversion and physics-informed neural networks (PINNs), are explored. The direct inversion entails three steps: (i) spectral denoising… ▽ More This study takes advantage of recent advances in machine learning to establish a physics-based data analytic platform for distributed reconstruction of mechanical properties in layered components from full waveform data. In this vein, two logics, namely the direct inversion and physics-informed neural networks (PINNs), are explored. The direct inversion entails three steps: (i) spectral denoising and differentiation of the full-field data, (ii) building appropriate neural maps to approximate the profile of unknown physical and regularization parameters on their respective domains, and (iii) simultaneous training of the neural networks by minimizing the Tikhonov-regularized PDE loss using data from (i). PINNs furnish efficient surrogate models of complex systems with predictive capabilities via multitask learning where the field variables are modeled by neural maps endowed with (scaler or distributed) auxiliary parameters such as physical unknowns and loss function weights. PINNs are then trained by minimizing a measure of data misfit subject to the underlying physical laws as constraints. In this study, to facilitate learning from ultrasonic data, the PINNs loss adopts (a) wavenumber-dependent Sobolev norms to compute the data misfit, and (b) non-adaptive weights in a specific scaling framework to naturally balance the loss objectives by leveraging the form of PDEs germane to elastic-wave propagation. Both paradigms are examined via synthetic and laboratory test data. In the latter case, the reconstructions are performed at multiple frequencies and the results are verified by a set of complementary experiments highlighting the importance of verification and validation in data-driven modeling. △ Less

Submitted 6 January, 2023; originally announced January 2023.

arXiv:2301.01679 [pdf, other]

COVID-Net USPro: An Open-Source Explainable Few-Shot Deep Prototypical Network to Monitor and Detect COVID-19 Infection from Point-of-Care Ultrasound Images

Authors: Jessy Song, Ashkan Ebadi, Adrian Florea, Pengcheng Xi, Stéphane Tremblay, Alexander Wong

Abstract: As the Coronavirus Disease 2019 (COVID-19) continues to impact many aspects of life and the global healthcare systems, the adoption of rapid and effective screening methods to prevent further spread of the virus and lessen the burden on healthcare providers is a necessity. As a cheap and widely accessible medical image modality, point-of-care ultrasound (POCUS) imaging allows radiologists to ident… ▽ More As the Coronavirus Disease 2019 (COVID-19) continues to impact many aspects of life and the global healthcare systems, the adoption of rapid and effective screening methods to prevent further spread of the virus and lessen the burden on healthcare providers is a necessity. As a cheap and widely accessible medical image modality, point-of-care ultrasound (POCUS) imaging allows radiologists to identify symptoms and assess severity through visual inspection of the chest ultrasound images. Combined with the recent advancements in computer science, applications of deep learning techniques in medical image analysis have shown promising results, demonstrating that artificial intelligence-based solutions can accelerate the diagnosis of COVID-19 and lower the burden on healthcare professionals. However, the lack of a huge amount of well-annotated data poses a challenge in building effective deep neural networks in the case of novel diseases and pandemics. Motivated by this, we present COVID-Net USPro, an explainable few-shot deep prototypical network, that monitors and detects COVID-19 positive cases with high precision and recall from minimal ultrasound images. COVID-Net USPro achieves 99.65% overall accuracy, 99.7% recall and 99.67% precision for COVID-19 positive cases when trained with only 5 shots. The analytic pipeline and results were verified by our contributing clinician with extensive experience in POCUS interpretation, ensuring that the network makes decisions based on actual patterns. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: 12 pages, 5 figures

arXiv:2212.07898 [pdf, other]

An Effective Methodology for Short-Circuit Calculation of Power Systems Dominated by Power Electronics Converters Considering Unbalanced Voltage Conditions and Converter Limits

Authors: Jie Song, Marc Cheah-Mane, Eduardo Prieto-Araujo, Oriol Gomis-Bellmunt

Abstract: This paper deals with the challenge of short-circuit calculation for power systems dominated by power electronics converters. A novel methodology has been presented to identify short-circuit equilibrium point of the studied system considering the operation and limitations of power converters. The studied system has been modeled with an element-based steady-state formulation. In particular, the gov… ▽ More This paper deals with the challenge of short-circuit calculation for power systems dominated by power electronics converters. A novel methodology has been presented to identify short-circuit equilibrium point of the studied system considering the operation and limitations of power converters. The studied system has been modeled with an element-based steady-state formulation. In particular, the governing equations implemented on converter control, which involve a specific control mode and various potential current-saturation states, are included. Then, an effective and efficient approach has been proposed to identify the short-circuit equilibrium points that satisfies converters limitations. Numerical case studies with VSCs show that the proposed methodology can identify the short-circuit equilibrium point efficiently and accurately with different types and depths of short-circuit fault. The short-circuit calculation results have been validated with dynamic simulations. △ Less

Submitted 19 December, 2022; v1 submitted 15 December, 2022; originally announced December 2022.

arXiv:2211.13933 [pdf, ps, other]

Enhanced Tracking and Beamforming Codebook Design for Wideband Terahertz Massive MIMO System

Authors: Xu Shi, Jintao Wang, Jian Song

Abstract: True-time-delay (TTD) lines are recently applied inside Terahertz (THz) hybrid-precoding transceiver to acquire high beamforming gain against beam squint effect. However, beam tracking turns into a challenging puzzle where enormous potential beam directions bring about unacceptable overhead consumption. Frequency-scanning-based beam tracking is initially explored but still imperfect in previous st… ▽ More True-time-delay (TTD) lines are recently applied inside Terahertz (THz) hybrid-precoding transceiver to acquire high beamforming gain against beam squint effect. However, beam tracking turns into a challenging puzzle where enormous potential beam directions bring about unacceptable overhead consumption. Frequency-scanning-based beam tracking is initially explored but still imperfect in previous studies. In this paper, based on TTD-aided hybrid precoding structure, we give an enhanced frequency-scanning-based tracking scheme. Multiple beams are generated and utilized simultaneously via several subcarriers for tracking at one timeslot. The squint beams' angular coverage at all subcarriers can be flexibly controlled by two different subcarrier-angular mapping policies, named forward-pairing and backward-pairing. Then multiple physical directions can be simultaneously searched in one timeslot for lower overhead consumption. Besides, closed-form searching radius bound, parameter configuration and interferences are theoretically analyzed. Furthermore, we provide the coupled codebook design for TTDs and phase shifters (PSs), with joint consideration of both beamforming and tracking. Analytical and numerical results demonstrate the superiority of the new frequency-scanning-based tracking scheme and beamforming codebook. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: This work has been submitted to the IEEE journal for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2210.03345 [pdf, ps, other]

doi 10.1109/TWC.2023.330322

Spatial-chirp Codebook-based Hierarchical Beam Training for Extremely Large-Scale Massive MIMO

Authors: Xu Shi, Jintao Wang, Zhi Sun, Jian Song

Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) promises to provide ultrahigh data rates in millimeter-wave (mmWave) and Terahertz (THz) spectrum. However, the spherical-wavefront wireless transmission caused by large aperture array presents huge challenges for channel state information (CSI) acquisition and beamforming. Two independent parameters (physical angles and transmission d… ▽ More Extremely large-scale multiple-input multiple-output (XL-MIMO) promises to provide ultrahigh data rates in millimeter-wave (mmWave) and Terahertz (THz) spectrum. However, the spherical-wavefront wireless transmission caused by large aperture array presents huge challenges for channel state information (CSI) acquisition and beamforming. Two independent parameters (physical angles and transmission distance) should be simultaneously considered in XL-MIMO beamforming, which brings severe overhead consumption and beamforming degradation. To address this problem, we exploit the near-field channel characteristic and propose two low-overhead hierarchical beam training schemes for near-field XL-MIMO system. Firstly, we project near-field channel into spatial-angular domain and slope-intercept domain to capture detailed representations. Then we point out three critical criteria for XL-MIMO hierarchical beam training. Secondly, a novel spatial-chirp beam-aided codebook and corresponding hierarchical update policy are proposed. Thirdly, given the imperfect coverage and overlapping of spatial-chirp beams, we further design an enhanced hierarchical training codebook via manifold optimization and alternative minimization. Theoretical analyses and numerical simulations are also displayed to verify the superior performances on beamforming and training overhead. △ Less

Submitted 15 August, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: accepted by IEEE Transactions on Wireless Communications (TWC). DOI: 10.1109/TWC.2023.3303229

arXiv:2209.12818 [pdf, other]

Spatial Signal Design for Positioning via End-to-End Learning

Authors: Steven Rivetti, Josè Miguel Mateos-Ramos, Yibo Wu, Jinxiang Song, Musa Furkan Keskin, Vijaya Yajnanarayana, Christian Häger, Henk Wymeersch

Abstract: This letter considers the problem of end-to-end learning for joint optimization of transmitter precoding and receiver processing for mmWave downlink positioning. Considering a multiple-input single-output (MISO) scenario, we propose a novel autoencoder (AE) architecture to estimate user-equipment(UE) position with multiple base-stations (BSs) and demonstrate that end-to-end learning can match mode… ▽ More This letter considers the problem of end-to-end learning for joint optimization of transmitter precoding and receiver processing for mmWave downlink positioning. Considering a multiple-input single-output (MISO) scenario, we propose a novel autoencoder (AE) architecture to estimate user-equipment(UE) position with multiple base-stations (BSs) and demonstrate that end-to-end learning can match model-based design, both for angle of departure (AoD) and position estimation, under ideal conditions without model deficits and outperform it in the presence of hardware impairments. △ Less

Submitted 2 December, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.11888 [pdf, other]

JPEG Artifact Correction using Denoising Diffusion Restoration Models

Authors: Bahjat Kawar, Jiaming Song, Stefano Ermon, Michael Elad

Abstract: Diffusion models can be used as learned priors for solving various inverse problems. However, most existing approaches are restricted to linear inverse problems, limiting their applicability to more general cases. In this paper, we build upon Denoising Diffusion Restoration Models (DDRM) and propose a method for solving some non-linear inverse problems. We leverage the pseudo-inverse operator used… ▽ More Diffusion models can be used as learned priors for solving various inverse problems. However, most existing approaches are restricted to linear inverse problems, limiting their applicability to more general cases. In this paper, we build upon Denoising Diffusion Restoration Models (DDRM) and propose a method for solving some non-linear inverse problems. We leverage the pseudo-inverse operator used in DDRM and generalize this concept for other measurement operators, which allows us to use pre-trained unconditional diffusion models for applications such as JPEG artifact correction. We empirically demonstrate the effectiveness of our approach across various quality factors, attaining performance levels that are on par with state-of-the-art methods trained specifically for the JPEG restoration task. △ Less

Submitted 23 November, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: Presented at NeurIPS 2022 Workshop on Score-Based Methods. Code: https://github.com/bahjat-kawar/ddrm-jpeg

arXiv:2209.10843 [pdf, ps, other]

A Unified Joint Optimization of Training Sequences and Transceivers Based on Matrix-Monotonic Optimization

Authors: Chengwen Xing, Tao Yu, Jinpeng Song, Zhong Zheng, Lian Zhao, Lajos Hanzo

Abstract: Channel estimation and data transmission constitute the most fundamental functional modules of multiple-input multiple-output (MIMO) communication systems. The underlying key tasks corresponding to these modules are training sequence optimization and transceiver optimization. Hence, we jointly optimize the linear transmit precoder and the training sequence of MIMO systems using the metrics of thei… ▽ More Channel estimation and data transmission constitute the most fundamental functional modules of multiple-input multiple-output (MIMO) communication systems. The underlying key tasks corresponding to these modules are training sequence optimization and transceiver optimization. Hence, we jointly optimize the linear transmit precoder and the training sequence of MIMO systems using the metrics of their effective mutual information (MI), effective mean squared error (MSE), effective weighted MI, effective weighted MSE, as well as their effective generic Schur-convex and Schur-concave functions. Both statistical channel state information (CSI) and estimated CSI are considered at the transmitter in the joint optimization. A unified framework termed as joint matrix-monotonic optimization is proposed. Based on this, the optimal precoder matrix and training matrix structures can be derived for both CSI scenarios. Then, based on the optimal matrix structures, our linear transceivers and their training sequences can be jointly optimized. Compared to state-of-the-art benchmark algorithms, the proposed algorithms visualize the bold explicit relationships between the attainable system performance of our linear transceivers conceived and their training sequences, leading to implementation ready recipes. Finally, several numerical results are provided, which corroborate our theoretical results and demonstrate the compelling benefits of our proposed pilot-aided MIMO solutions. △ Less

Submitted 17 July, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 39 pages, 9 figures, 1 table, manuscript accepted in IEEE TVT

arXiv:2209.00992 [pdf, other]

Single-scatter channel impulse response model of non-line-of-sight ultraviolet communications

Authors: Tian Cao, Shihan Chen, Tianfeng Wu, Changyong Pan, Jian Song

Abstract: Previous studies on the temporal characteristics of single-scatter transmission in non-line-of-sight (NLOS) ultraviolet communications (UVC) were based on the prolate-spheroidal coordinate system. In this work, a novel single-scatter channel impulse response (CIR) model is proposed in the spherical coordinate system, which is more natural and comprehensible than the prolate-spheroidal coordinate s… ▽ More Previous studies on the temporal characteristics of single-scatter transmission in non-line-of-sight (NLOS) ultraviolet communications (UVC) were based on the prolate-spheroidal coordinate system. In this work, a novel single-scatter channel impulse response (CIR) model is proposed in the spherical coordinate system, which is more natural and comprehensible than the prolate-spheroidal coordinate system in practical applications. Additionally, the results of the widely accepted Monte-Carlo (MC)-based channel model of NLOS UVC are provided to verify the proposed single-scatter CIR model. Results indicate that the computational time costed by the proposed single-scatter CIR model is decreased to less than 0.7% of the MC-based one with comparable accuracy in assessing the temporal characteristics of NLOS UVC channels. △ Less

Submitted 28 August, 2022; originally announced September 2022.

Comments: 10 pages, 4 figures

Showing 1–50 of 119 results for author: Song, J