Search | arXiv e-print repository

Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

Authors: Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin, Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang-Ting Cheng, Hao Chen

Abstract: Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear.… ▽ More Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear. To address this gap, we established a most comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 39 specific tasks. Our findings reveal that existing foundation models excel at certain task types but struggle to effectively handle the full breadth of clinical tasks. To improve the generalization of pathology foundation models, we propose a unified knowledge distillation framework consisting of both expert and self knowledge distillation, where the former allows the model to learn from the knowledge of multiple expert models, while the latter leverages self-distillation to enable image representation learning via local-global alignment. Based on this framework, a Generalizable Pathology Foundation Model (GPFM) is pretrained on a large-scale dataset consisting of 190 million images from around 86,000 public H&E whole slides across 34 major tissue types. Evaluated on the established benchmark, GPFM achieves an impressive average rank of 1.36, with 29 tasks ranked 1st, while the the second-best model, UNI, attains an average rank of 2.96, with only 4 tasks ranked 1st. The superior generalization of GPFM demonstrates its exceptional modeling capabilities across a wide range of clinical tasks, positioning it as a new cornerstone for feature representation in CPath. △ Less

Submitted 3 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

Report number: I.2.10

arXiv:2407.16404 [pdf]

Evaluating Uncertainties in Electricity Markets via Machine Learning and Quantum Computing

Authors: Shuyang Zhu, Ziqing Zhu, Linghua Zhu, Yujian Ye, Siqi Bu, Sasa Z. Djokic

Abstract: The analysis of decision-making process in electricity markets is crucial for understanding and resolving issues related to market manipulation and reduced social welfare. Traditional Multi-Agent Reinforcement Learning (MARL) method can model decision-making of generation companies (GENCOs), but faces challenges due to uncertainties in policy functions, reward functions, and inter-agent interactio… ▽ More The analysis of decision-making process in electricity markets is crucial for understanding and resolving issues related to market manipulation and reduced social welfare. Traditional Multi-Agent Reinforcement Learning (MARL) method can model decision-making of generation companies (GENCOs), but faces challenges due to uncertainties in policy functions, reward functions, and inter-agent interactions. Quantum computing offers a promising solution to resolve these uncertainties, and this paper introduces the Quantum Multi-Agent Deep Q-Network (Q-MADQN) method, which integrates variational quantum circuits into the traditional MARL framework. The main contributions of the paper are: identifying the correspondence between market uncertainties and quantum properties, proposing the Q-MADQN algorithm for simulating electricity market bidding, and demonstrating that Q-MADQN allows for a more thorough exploration and simulates more potential bidding strategies of profit-oriented GENCOs, compared to conventional methods, without compromising computational efficiency. The proposed method is illustrated on IEEE 30-bus test network, confirming that it offers a more accurate model for simulating complex market dynamics. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 3 pages, 3 figures, plan for submitting to IEEE Power Engineering Letters

arXiv:2407.11389 [pdf, ps, other]

Spatial-spectral Cell-free Networks: A Large-scale Case Study

Authors: Zesheng Zhu, Lifeng Wang, Xin Wang, Dongming Wang, Kai-Kit Wong

Abstract: This paper studies the large-scale cell-free networks where dense distributed access points (APs) serve many users. As a promising next-generation network architecture, cell-free networks enable ultra-reliable connections and minimal fading/blockage, which are much favorable to the millimeter wave and Terahertz transmissions. However, conventional beam management with large phased arrays in a cell… ▽ More This paper studies the large-scale cell-free networks where dense distributed access points (APs) serve many users. As a promising next-generation network architecture, cell-free networks enable ultra-reliable connections and minimal fading/blockage, which are much favorable to the millimeter wave and Terahertz transmissions. However, conventional beam management with large phased arrays in a cell is very time-consuming in the higher-frequencies, and could be worsened when deploying a large number of coordinated APs in the cell-free systems. To tackle this challenge, the spatial-spectral cell-free networks with the leaky-wave antennas are established by coupling the propagation angles with frequencies. The beam training overhead in this direction can be significantly reduced through exploiting such spatial-spectral coupling effects. In the considered large-scale spatial-spectral cell-free networks, a novel subchannel allocation solution at sub-terahertz bands is proposed by leveraging the relationship between cross-entropy method and mixture model. Since initial access and AP clustering play a key role in achieving scalable large-scale cell-free networks, a hierarchical AP clustering solution is proposed to make the joint initial access and cluster formation, which is adaptive and has no need to initialize the number of AP clusters. After AP clustering, a subchannel allocation solution is devised to manage the interference between AP clusters. Numerical results are presented to confirm the efficiency of the proposed solutions and indicate that besides subchannel allocation, AP clustering can also have a big impact on the large-scale cell-free network performance at sub-terahertz bands. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.10408 [pdf, other]

Latency Minimization for IRS-enhanced Wideband MEC Networks with Practical Reflection Model

Authors: N. Li, W. Hao, X. Li, Z. Zhu, Z. Tang, S. Yang

Abstract: Intelligent reflecting surface (IRS) has been considered as an efficient way to boost the computation capability of mobile edge computing (MEC) system, especially when the communication links is blocked or the communication signal is weak. However, most existing works are restricted to narrow-band channel and ideal IRS reflection model, which is not practical and may lead to significant performanc… ▽ More Intelligent reflecting surface (IRS) has been considered as an efficient way to boost the computation capability of mobile edge computing (MEC) system, especially when the communication links is blocked or the communication signal is weak. However, most existing works are restricted to narrow-band channel and ideal IRS reflection model, which is not practical and may lead to significant performance degradation in realistic systems. To further exploit the benefits of IRS in MEC system, we consider an IRS-enhanced wideband MEC system with practical IRS reflection model. With the aim of minimizing the weighted latency of all devices, the offloading data volume, edge computing resource, BS's receiving vector, and IRS passive beamforming are jointly optimized. Since the formulated problem is non-convex, we employ the block coordinate descent (BCD) technique to decouple it into two subproblems for alternatively optimizing computing and communication settings. The effectiveness and convergence of the proposed algorithm are validate via numerical analyses. In addition, simulation results demonstrate that the proposed algorithm can achieve lower latency compared to that based on the ideal IRS reflection model, which confirms the necessary of considering practical model when designing an IRS-enhanced wideband MEC system. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 13 pages, 9 figures

arXiv:2407.04561 [pdf, other]

Wireless Spectrum in Rural Farmlands: Status, Challenges and Opportunities

Authors: Mukaram Shahid, Kunal Das, Taimoor Ul Islam, Christ Somiah, Daji Qiao, Arsalan Ahmad, Jimming Song, Zhengyuan Zhu, Sarath Babu, Yong Guan, Tusher Chakraborty, Suraj Jog, Ranveer Chandra, Hongwei Zhang

Abstract: Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to hi… ▽ More Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to high-speed Internet in under-served areas without additional cost to expensive licensed spectrum. However, the current methods to utilize these white spaces are inefficient due to very conservative models and spectrum policies, causing under-utilization of valuable spectrum resources. This hampers the full potential of innovative wireless technologies that could benefit farmers, small Internet Service Providers (ISPs) or Mobile Network Operators (MNOs) operating in rural regions. This study explores the challenges faced by farmers and service providers when using shared spectrum bands to deploy their networks while ensuring maximum system performance and minimizing interference with other users. Additionally, we discuss how spatiotemporal spectrum models, in conjunction with database-driven spectrum-sharing solutions, can enhance the allocation and management of spectrum resources, ultimately improving the efficiency and reliability of wireless networks operating in shared spectrum bands. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.00964 [pdf, other]

Multi-Modal Fusion-Based Multi-Task Semantic Communication System

Authors: Zengle Zhu, Rongqing Zhang, Xiang Cheng, Liuqing Yang

Abstract: In recent years, there has been significant progress in semantic communication systems empowered by deep learning techniques. It has greatly improved the efficiency of information transmission. Nevertheless, traditional semantic communication models still face challenges, particularly due to their single-task and single-modal orientation. Many of these models are designed for specific tasks, which… ▽ More In recent years, there has been significant progress in semantic communication systems empowered by deep learning techniques. It has greatly improved the efficiency of information transmission. Nevertheless, traditional semantic communication models still face challenges, particularly due to their single-task and single-modal orientation. Many of these models are designed for specific tasks, which may result in limitations when applied to multi-task communication systems. Moreover, these models often overlook the correlations among different modal data in multi-modal tasks. It leads to an incomplete understanding of complex information, causing increased communication overhead and diminished performance. To address these problems, we propose a multi-modal fusion-based multi-task semantic communication (MFMSC) framework. In contrast to traditional semantic communication approaches, MFMSC can effectively handle various tasks across multiple modalities. Furthermore, we design a fusion module based on Bidirectional Encoder Representations from Transformers (BERT) for multi-modal semantic information fusion. By leveraging the powerful semantic understanding capabilities and self-attention mechanism of BERT, we achieve effective fusion of semantic information from different modalities. We compare our model with multiple benchmarks. Simulation results show that MFMSC outperforms these models in terms of both performance and communication overhead. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.18009 [pdf, other]

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the audio infilling task. Unlike many previous works, it does not require additional components (e.g., duration model, grapheme-to-phoneme) or complex techniques (e.g., monotonic alignment search). Despite its simplicity, E2 TTS achieves state-of-the-art zero-shot TTS capabilities that are comparable to or surpass previous works, including Voicebox and NaturalSpeech 3. The simplicity of E2 TTS also allows for flexibility in the input representation. We propose several variants of E2 TTS to improve usability during inference. See https://aka.ms/e2tts/ for demo samples. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.09931 [pdf, other]

SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, Jin Fan, Changmiao Wang, Yu Gao, Gang Yu

Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redundant feature extraction when processing high-dimensional microimage data. We propose a novel fine-grained classification model, SCKansformer, for bone marrow blood cells, which addresses these challenges and enhances classification accuracy and efficiency. The model integrates the Kansformer Encoder, SCConv Encoder, and Global-Local Attention Encoder. The Kansformer Encoder replaces the traditional MLP layer with the KAN, improving nonlinear feature representation and interpretability. The SCConv Encoder, with its Spatial and Channel Reconstruction Units, enhances feature representation and reduces redundancy. The Global-Local Attention Encoder combines Multi-head Self-Attention with a Local Part module to capture both global and local features. We validated our model using the Bone Marrow Blood Cell Fine-Grained Classification Dataset (BMCD-FGCD), comprising over 10,000 samples and nearly 40 classifications, developed with a partner hospital. Comparative experiments on our private dataset, as well as the publicly available PBC and ALL-IDB datasets, demonstrate that SCKansformer outperforms both typical and advanced microcell classification methods across all datasets. Our source code and private BMCD-FGCD dataset are available at https://github.com/JustlfC03/SCKansformer. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 15 pages, 6 figures

arXiv:2406.06002 [pdf, other]

Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition

Authors: Zhen Qin, Zhihui Zhu

Abstract: Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (T… ▽ More Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order $N+M$. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.02592

arXiv:2406.05699 [pdf, ps, other]

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

Authors: Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda

Abstract: Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue. In this paper, we explored various strategies to enhance the quality of audi… ▽ More Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue. In this paper, we explored various strategies to enhance the quality of audio generated from noisy audio prompts within the context of flow-matching-based zero-shot TTS. Our investigation includes comprehensive training strategies: unsupervised pre-training with masked speech denoising, multi-speaker detection and DNSMOS-based data filtering on the pre-training data, and fine-tuning with random noise mixing. The results of our experiments demonstrate significant improvements in intelligibility, speaker similarity, and overall audio quality compared to the approach of applying speech enhancement to the audio prompt. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted to INTERSPEECH2024

arXiv:2406.04281 [pdf, other]

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Jinyu Li, Sheng Zhao, Naoyuki Kanda

Abstract: Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker characteristics, has been underexplored. In this work, we propose a novel total-duration-aware (TDA) duration model for TTS, where phoneme durations a… ▽ More Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker characteristics, has been underexplored. In this work, we propose a novel total-duration-aware (TDA) duration model for TTS, where phoneme durations are predicted not only from the text input but also from an additional input of the total target duration. We also propose a MaskGIT-based duration model that enhances the diversity and quality of the predicted phoneme durations. Our results demonstrate that the proposed TDA duration models achieve better intelligibility and speaker similarity for various speech rate configurations compared to the baseline models. We also show that the proposed MaskGIT-based model can generate phoneme durations with higher quality and diversity compared to its regression or flow-matching counterparts. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2405.10691 [pdf, other]

LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets with missing time points. This limitation significantly impedes subsequent neuroscience and clinical modeling. Yet, existing deep generative models are facing difficulties in missing brain image completion, due to sparse data and the nonlinear, dramatic contrast/geometric variations in the developing brain. We propose LoCI-DiffCom, a novel Longitudinal Consistency-Informed Diffusion model for infant brain image Completion,which integrates the images from preceding and subsequent time points to guide a diffusion model for generating high-fidelity missing data. Our designed LoCI module can work on highly sparse sequences, relying solely on data from two temporal points. Despite wide separation and diversity between age time points, our approach can extract individualized developmental features while ensuring context-aware consistency. Our experiments on a large infant brain MR dataset demonstrate its effectiveness with consistent performance on missing infant brain MR completion even in big gap scenarios, aiding in better delineation of early developmental trajectories. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2404.19167 [pdf]

Advancing low-field MRI with a universal denoising imaging transformer: Towards fast and high-quality imaging

Authors: Zheren Zhu, Azaan Rehman, Xiaozhi Cao, Congyu Liao, Yoo Jin Lee, Michael Ohliger, Hui Xue, Yang Yang

Abstract: Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models… ▽ More Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models with constrained data diversity and quantity, which exhibit limited generalizability in clinical applications across diverse MRI systems, pulse sequences, and organs. In this study, we present ImT-MRD: a complex-valued imaging transformer trained on a vast number of clinical MRI scans aiming at universal MR denoising at LF systems. Compared with averaging multiple-repeated scans for higher image SNR, the model obtains better image quality from fewer repetitions, demonstrating its capability for accelerating scans under various clinical settings. Moreover, with its complex-valued image input, the model can denoise intermediate results before advanced post-processing and prepare high-quality data for further MRI research. By delivering universal and accurate denoising across clinical and research tasks, our model holds great promise to expedite the evolution of LF MRI for accessible and equal biomedical applications. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.10643 [pdf, other]

A Calibrated and Automated Simulator for Innovations in 5G

Authors: Conrado Boeira, Antor Hasan, Khaleda Papry, Yue Ju, Zhongwen Zhu, Israat Haque

Abstract: The rise of 5G deployments has created the environment for many emerging technologies to flourish. Self-driving vehicles, Augmented and Virtual Reality, and remote operations are examples of applications that leverage 5G networks' support for extremely low latency, high bandwidth, and increased throughput. However, the complex architecture of 5G hinders innovation due to the lack of accessibility… ▽ More The rise of 5G deployments has created the environment for many emerging technologies to flourish. Self-driving vehicles, Augmented and Virtual Reality, and remote operations are examples of applications that leverage 5G networks' support for extremely low latency, high bandwidth, and increased throughput. However, the complex architecture of 5G hinders innovation due to the lack of accessibility to testbeds or realistic simulators with adequate 5G functionalities. Also, configuring and managing simulators are complex and time consuming. Finally, the lack of adequate representative data hinders the data-driven designs in 5G campaigns. Thus, we calibrated a system-level open-source simulator, Simu5G, following 3GPP guidelines to enable faster innovation in the 5G domain. Furthermore, we developed an API for automatic simulator configuration without knowing the underlying architectural details. Finally, we demonstrate the usage of the calibrated and automated simulator by developing an ML-based anomaly detection in a 5G Radio Access Network (RAN). △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.02382 [pdf]

Imaging transformer for MRI denoising with the SNR unit training: enabling generalization across field-strengths, imaging contrasts, and anatomy

Authors: Hui Xue, Sarah Hooper, Azaan Rehman, Iain Pierce, Thomas Treibel, Rhodri Davies, W Patricia Bandettini, Rajiv Ramasawmy, Ahsan Javed, Zheren Zhu, Yang Yang, James Moon, Adrienne Campbell, Peter Kellman

Abstract: The ability to recover MRI signal from noise is key to achieve fast acquisition, accurate quantification, and high image quality. Past work has shown convolutional neural networks can be used with abundant and paired low and high-SNR images for training. However, for applications where high-SNR data is difficult to produce at scale (e.g. with aggressive acceleration, high resolution, or low field… ▽ More The ability to recover MRI signal from noise is key to achieve fast acquisition, accurate quantification, and high image quality. Past work has shown convolutional neural networks can be used with abundant and paired low and high-SNR images for training. However, for applications where high-SNR data is difficult to produce at scale (e.g. with aggressive acceleration, high resolution, or low field strength), training a new denoising network using a large quantity of high-SNR images can be infeasible. In this study, we overcome this limitation by improving the generalization of denoising models, enabling application to many settings beyond what appears in the training data. Specifically, we a) develop a training scheme that uses complex MRIs reconstructed in the SNR units (i.e., the images have a fixed noise level, SNR unit training) and augments images with realistic noise based on coil g-factor, and b) develop a novel imaging transformer (imformer) to handle 2D, 2D+T, and 3D MRIs in one model architecture. Through empirical evaluation, we show this combination improves performance compared to CNN models and improves generalization, enabling a denoising model to be used across field-strengths, image contrasts, and anatomy. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.20141 [pdf]

doi 10.1080/10298436.2024.2310095

Development of a full-Scale approach to predict overlay reflective crack

Authors: Zehui Zhu, Imad L. Al-Qadi

Abstract: Resurfacing a moderately deteriorated Portland cement concrete (PCC) pavement with asphalt concrete (AC) layers is considered an efficient rehabilitation practice. However, reflective cracks may develop shortly after resurfacing because of discontinuities (e.g. joints and cracks) in existing PCC pavement. In this paper, a new accelerated full-scale testing approach was developed to study reflectiv… ▽ More Resurfacing a moderately deteriorated Portland cement concrete (PCC) pavement with asphalt concrete (AC) layers is considered an efficient rehabilitation practice. However, reflective cracks may develop shortly after resurfacing because of discontinuities (e.g. joints and cracks) in existing PCC pavement. In this paper, a new accelerated full-scale testing approach was developed to study reflective crack growth in AC overlays. Two hydraulic actuators were used to simulate a moving dual-tire assembly with a loading rate of more than five-thousand-wheel passes per hour. A load cycle consists of three steps, simulating a tire approaching, moving across, and leaving a PCC discontinuity. Experiments were conducted to compare the reflective crack behaviour of two overlay configurations. Both test sections were fully cracked in less than an hour. The initiation and propagation of reflective cracks were explicitly documented using crack detectors in conjunction with a camera. The proposed full-scale testing protocol offers a repeatable and efficient approach to systematically investigate the effects of various overlay configurations, thus enabling the identification of optimal design against reflective cracking. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Journal ref: International Journal of Pavement Engineering, 25(1), 2310095 (2024)

arXiv:2403.16476 [pdf]

A Method for Target Detection Based on Mmw Radar and Vision Fusion

Authors: Ming Zong, Jiaying Wu, Zhanyu Zhu, Jingen Ni

Abstract: An efficient and accurate traffic monitoring system often takes advantages of multi-sensor detection to ensure the safety of urban traffic, promoting the accuracy and robustness of target detection and tracking. A method for target detection using Radar-Vision Fusion Path Aggregation Fully Convolutional One-Stage Network (RV-PAFCOS) is proposed in this paper, which is extended from Fully Convoluti… ▽ More An efficient and accurate traffic monitoring system often takes advantages of multi-sensor detection to ensure the safety of urban traffic, promoting the accuracy and robustness of target detection and tracking. A method for target detection using Radar-Vision Fusion Path Aggregation Fully Convolutional One-Stage Network (RV-PAFCOS) is proposed in this paper, which is extended from Fully Convolutional One-Stage Network (FCOS) by introducing the modules of radar image processing branches, radar-vision fusion and path aggregation. The radar image processing branch mainly focuses on the image modeling based on the spatiotemporal calibration of millimeter-wave (mmw) radar and cameras, taking the conversion of radar point clouds to radar images. The fusion module extracts features of radar and optical images based on the principle of spatial attention stitching criterion. The path aggregation module enhances the reuse of feature layers, combining the positional information of shallow feature maps with deep semantic information, to obtain better detection performance for both large and small targets. Through the experimental analysis, the method proposed in this paper can effectively fuse the mmw radar and vision perceptions, showing good performance in traffic target detection. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.11699 [pdf, other]

A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos

Authors: Zhengzheng Tu, Zigang Zhu, Yayang Duan, Bo Jiang, Qishun Wang, Chaoxue Zhang

Abstract: Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploi… ▽ More Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploit the lesion cues of both intra-frame and inter-frame simultaneously. To address this problem, we propose a novel Spatial-Temporal Progressive Fusion Network (STPFNet) for video based breast lesion segmentation problem. The main aspects of the proposed STPFNet are threefold. First, we propose to adopt a unified network architecture to capture both spatial dependences within each ultrasound frame and temporal correlations between different frames together for ultrasound data representation. Second, we propose a new fusion module, termed Multi-Scale Feature Fusion (MSFF), to fuse spatial and temporal cues together for lesion detection. MSFF can help to determine the boundary contour of lesion region to overcome the issue of lesion boundary blurring. Third, we propose to exploit the segmentation result of previous frame as the prior knowledge to suppress the noisy background and learn more robust representation. In particular, we introduce a new publicly available ultrasound video breast lesion segmentation dataset, termed UVBLS200, which is specifically dedicated to breast lesion segmentation. It contains 200 videos, including 80 videos of benign lesions and 120 videos of malignant lesions. Experiments on the proposed dataset demonstrate that the proposed STPFNet achieves better breast lesion detection performance than state-of-the-art methods. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11556 [pdf, other]

Hierarchical Frequency-based Upsampling and Refining for Compressed Video Quality Enhancement

Authors: Qianyu Zhang, Bolun Zheng, Xinying Chen, Quan Chen, Zhunjie Zhu, Canjin Wang, Zongpeng Li, Chengang Yan

Abstract: Video compression artifacts arise due to the quantization operation in the frequency domain. The goal of video quality enhancement is to reduce compression artifacts and reconstruct a visually-pleasant result. In this work, we propose a hierarchical frequency-based upsampling and refining neural network (HFUR) for compressed video quality enhancement. HFUR consists of two modules: implicit frequen… ▽ More Video compression artifacts arise due to the quantization operation in the frequency domain. The goal of video quality enhancement is to reduce compression artifacts and reconstruct a visually-pleasant result. In this work, we propose a hierarchical frequency-based upsampling and refining neural network (HFUR) for compressed video quality enhancement. HFUR consists of two modules: implicit frequency upsampling module (ImpFreqUp) and hierarchical and iterative refinement module (HIR). ImpFreqUp exploits DCT-domain prior derived through implicit DCT transform, and accurately reconstructs the DCT-domain loss via a coarse-to-fine transfer. Consequently, HIR is introduced to facilitate cross-collaboration and information compensation between the scales, thus further refine the feature maps and promote the visual quality of the final output. We demonstrate the effectiveness of the proposed modules via ablation experiments and visualized results. Extensive experiments on public benchmarks show that HFUR achieves state-of-the-art performance for both constant bit rate and constant QP modes. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2402.16413 [pdf, other]

AI-enabled STAR-RIS aided MISO ISAC Secure Communications

Authors: Zhengyu Zhu, Mengfei Gong, Gangcan Sun, Peijia Liu, De Mi

Abstract: A simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) aided integrated sensing and communication (ISAC) dual-secure communication system is studied in this paper. The sensed target and legitimate users (LUs) are situated on the opposite sides of the STAR-RIS, and the energy splitting and time switching protocols are applied in the STAR-RIS, respectively. The long… ▽ More A simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) aided integrated sensing and communication (ISAC) dual-secure communication system is studied in this paper. The sensed target and legitimate users (LUs) are situated on the opposite sides of the STAR-RIS, and the energy splitting and time switching protocols are applied in the STAR-RIS, respectively. The long-term average security rate for LUs is maximized by the joint design of the base station (BS) transmit beamforming and receive filter, along with the STAR-RIS transmitting and reflecting coefficients, under guarantying the echo signal-to-noise ratio thresholds and rate constraints for the LUs. Since the channel information changes over time, conventional convex optimization techniques cannot provide the optimal performance for the system, and result in excessively high computational complexity in the exploration of the long-term gains for the system. Taking continuity control decisions into account, the deep deterministic policy gradient and soft actor-critic algorithms based on off-policy are applied to address the complex non-convex problem. Simulation results comprehensively evaluate the performance of the proposed two reinforcement learning algorithms and demonstrate that STAR-RIS is remarkably better than the two benchmarks in the ISAC system. △ Less

Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.13776 [pdf, other]

Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

Authors: Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, Yan Liang, Qing Yang, Dinggang Shen, Han Zhang

Abstract: Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, makin… ▽ More Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, making longitudinal infant brain atlas construction and developmental trajectory delineation quite challenging. Thanks to the development of an AI-based generative model, neuroimage completion has become a powerful technique to retain as much available data as possible. However, current image completion methods usually suffer from inconsistency within each individual subject in the time dimension, compromising the overall quality. To solve this problem, our paper proposed a two-stage cascaded diffusion model, Cas-DiffCom, for dense and longitudinal 3D infant brain MRI completion and super-resolution. We applied our proposed method to the Baby Connectome Project (BCP) dataset. The experiment results validate that Cas-DiffCom achieves both individual consistency and high fidelity in longitudinal infant brain image completion. We further applied the generated infant brain images to two downstream tasks, brain tissue segmentation and developmental trajectory delineation, to declare its task-oriented potential in the neuroscience field. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.07383 [pdf, other]

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Authors: Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

Abstract: Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their applications and user experience. While there have been prior works to generate natural laughter, they fell short in terms of controlling the timing an… ▽ More Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their applications and user experience. While there have been prior works to generate natural laughter, they fell short in terms of controlling the timing and variety of the laughter to be generated. In this work, we propose ELaTE, a zero-shot TTS that can generate natural laughing speech of any speaker based on a short audio prompt with precise control of laughter timing and expression. Specifically, ELaTE works on the audio prompt to mimic the voice characteristic, the text prompt to indicate the contents of the generated speech, and the input to control the laughter expression, which can be either the start and end times of laughter, or the additional audio prompt that contains laughter to be mimicked. We develop our model based on the foundation of conditional flow-matching-based zero-shot TTS, and fine-tune it with frame-level representation from a laughter detector as additional conditioning. With a simple scheme to mix small-scale laughter-conditioned data with large-scale pre-training data, we demonstrate that a pre-trained zero-shot TTS model can be readily fine-tuned to generate natural laughter with precise controllability, without losing any quality of the pre-trained zero-shot TTS model. Through objective and subjective evaluations, we show that ELaTE can generate laughing speech with significantly higher quality and controllability compared to conventional models. See https://aka.ms/elate/ for demo samples. △ Less

Submitted 4 March, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

Comments: See https://aka.ms/elate/ for demo samples, v2: subjective evaluation has been added

arXiv:2402.05412 [pdf]

Multi-Network Constrained Operational Optimization in Community Integrated Energy Systems: A Safe Reinforcement Learning Approach

Authors: Ze Hu, Ka Wing Chan, Ziqing Zhu, Xiang Wei, Siqi Bu

Abstract: The integrated community energy system (ICES) has emerged as a promising solution for enhancing the efficiency of the distribution system by effectively coordinating multiple energy sources. However, the operational optimization of ICES is hindered by the physical constraints of heterogeneous networks including electricity, natural gas, and heat. These challenges are difficult to address due to th… ▽ More The integrated community energy system (ICES) has emerged as a promising solution for enhancing the efficiency of the distribution system by effectively coordinating multiple energy sources. However, the operational optimization of ICES is hindered by the physical constraints of heterogeneous networks including electricity, natural gas, and heat. These challenges are difficult to address due to the non-linearity of network constraints and the high complexity of multi-network coordination. This paper, therefore, proposes a novel Safe Reinforcement Learning (SRL) algorithm to optimize the multi-network constrained operation problem of ICES. Firstly, a comprehensive ICES model is established considering integrated demand response (IDR), multiple energy devices, and network constraints. The multi-network operational optimization problem of ICES is then presented and reformulated as a constrained Markov Decision Process (C-MDP) accounting for violating physical network constraints. The proposed novel SRL algorithm, named Primal-Dual Twin Delayed Deep Deterministic Policy Gradient (PD-TD3), solves the C-MDP by employing a Lagrangian multiplier to penalize the multi-network constraint violation, ensuring that violations are within a tolerated range and avoid over-conservative strategy with a low reward at the same time. The proposed algorithm accurately estimates the cumulative reward and cost of the training process, thus achieving a fair balance between improving profits and reducing constraint violations in a privacy-protected environment with only partial information. A case study comparing the proposed algorithm with benchmark RL algorithms demonstrates the computational performance in increasing total profits and alleviating the network constraint violations. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2401.13893 [pdf, ps, other]

A Survey on Indoor Visible Light Positioning Systems: Fundamentals, Applications, and Challenges

Authors: Zhiyu Zhu, Yang Yang, Mingzhe Chen, Caili Guo, Julian Cheng, Shuguang Cui

Abstract: The growing demand for location-based services in areas like virtual reality, robot control, and navigation has intensified the focus on indoor localization. Visible light positioning (VLP), leveraging visible light communications (VLC), becomes a promising indoor positioning technology due to its high accuracy and low cost. This paper provides a comprehensive survey of VLP systems. In particular,… ▽ More The growing demand for location-based services in areas like virtual reality, robot control, and navigation has intensified the focus on indoor localization. Visible light positioning (VLP), leveraging visible light communications (VLC), becomes a promising indoor positioning technology due to its high accuracy and low cost. This paper provides a comprehensive survey of VLP systems. In particular, since VLC lays the foundation for VLP, we first present a detailed overview of the principles of VLC. The performance of each positioning algorithm is also compared in terms of various metrics such as accuracy, coverage, and orientation limitation. Beyond the physical layer studies, the network design for a VLP system is also investigated, including multi-access technologies resource allocation, and light-emitting diode (LED) placements. Next, the applications of the VLP systems are overviewed. Finally, this paper outlines open issues, challenges, and future research directions for the research field. In a nutshell, this paper constitutes the first holistic survey on VLP from state-of-the-art studies to practical uses. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.10218 [pdf]

A mode-multiplexed photonic integrated vector dot-product core from inverse design

Authors: Zheyuan Zhu, Raktim Sarma, Seth Smith-Dryden, Guifang Li, Shuo Pang

Abstract: Photonic computing has the potential of harnessing the full degrees of freedom (DOFs) of the light field, including wavelength, spatial mode, spatial location, phase quadrature, and polarization, to achieve higher level of computation parallelization and scalability than digital electronic processors. While multiplexing using wavelength and other DOFs can be readily integrated on silicon photonics… ▽ More Photonic computing has the potential of harnessing the full degrees of freedom (DOFs) of the light field, including wavelength, spatial mode, spatial location, phase quadrature, and polarization, to achieve higher level of computation parallelization and scalability than digital electronic processors. While multiplexing using wavelength and other DOFs can be readily integrated on silicon photonics platforms with compact footprints, conventional mode-division multiplexed (MDM) photonic designs occupy areas exceeding tens to hundreds of microns for a few spatial modes, significantly limiting their scalability. Here we utilize inverse design to demonstrate an ultracompact photonic computing core that calculates vector dot-products based on MDM coherent mixing within a nominal footprint of 5 um x 3 um. Our dot-product core integrates the functionalities of 2 mode multiplexers and 1 multi-mode coherent mixers, all within the footprint, and could be applied to various computation and computer vision tasks, with high computing throughput density. We experimentally demonstrate computing examples on the fabricated core, including complex number multiplication and motion estimation using optical flow. △ Less

Submitted 9 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 13 pages, 8 figures. Initial submission to Optica

arXiv:2401.09032 [pdf, other]

Improved Consensus ADMM for Cooperative Motion Planning of Large-Scale Connected Autonomous Vehicles with Limited Communication

Authors: Haichao Liu, Zhenmin Huang, Zicheng Zhu, Yulin Li, Shaojie Shen, Jun Ma

Abstract: This paper investigates a cooperative motion planning problem for large-scale connected autonomous vehicles (CAVs) under limited communications, which addresses the challenges of high communication and computing resource requirements. Our proposed methodology incorporates a parallel optimization algorithm with improved consensus ADMM considering a more realistic locally connected topology network,… ▽ More This paper investigates a cooperative motion planning problem for large-scale connected autonomous vehicles (CAVs) under limited communications, which addresses the challenges of high communication and computing resource requirements. Our proposed methodology incorporates a parallel optimization algorithm with improved consensus ADMM considering a more realistic locally connected topology network, and time complexity of O(N) is achieved by exploiting the sparsity in the dual update process. To further enhance the computational efficiency, we employ a lightweight evolution strategy for the dynamic connectivity graph of CAVs, and each sub-problem split from the consensus ADMM only requires managing a small group of CAVs. The proposed method implemented with the receding horizon scheme is validated thoroughly, and comparisons with existing numerical solvers and approaches demonstrate the efficiency of our proposed algorithm. Also, simulations on large-scale cooperative driving tasks involving 80 vehicles are performed in the high-fidelity CARLA simulator, which highlights the remarkable computational efficiency, scalability, and effectiveness of our proposed development. Demonstration videos are available at https://henryhcliu.github.io/icadmm_cmp_carla. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: 15 pages, 10 figures

arXiv:2401.08920 [pdf, other]

Idempotence and Perceptual Image Compression

Authors: Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

Abstract: Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we prop… ▽ More Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we propose a new paradigm of perceptual image codec by inverting unconditional generative model with idempotence constraints. Our codec is theoretically equivalent to conditional generative codec, and it does not require training new models. Instead, it only requires a pre-trained mean-square-error codec and unconditional generative model. Empirically, we show that our proposed approach outperforms state-of-the-art methods such as HiFiC and ILLM, in terms of Fréchet Inception Distance (FID). The source code is provided in https://github.com/tongdaxu/Idempotence-and-Perceptual-Image-Compression. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: ICLR 2024

arXiv:2401.04953 [pdf, other]

Adaptive-avg-pooling based Attention Vision Transformer for Face Anti-spoofing

Authors: Jichen Yang, Fangfan Chen, Rohan Kumar Das, Zhengyu Zhu, Shunsi Zhang

Abstract: Traditional vision transformer consists of two parts: transformer encoder and multi-layer perception (MLP). The former plays the role of feature learning to obtain better representation, while the latter plays the role of classification. Here, the MLP is constituted of two fully connected (FC) layers, average value computing, FC layer and softmax layer. However, due to the use of average value com… ▽ More Traditional vision transformer consists of two parts: transformer encoder and multi-layer perception (MLP). The former plays the role of feature learning to obtain better representation, while the latter plays the role of classification. Here, the MLP is constituted of two fully connected (FC) layers, average value computing, FC layer and softmax layer. However, due to the use of average value computing module, some useful information may get lost, which we plan to preserve by the use of alternative framework. In this work, we propose a novel vision transformer referred to as adaptive-avg-pooling based attention vision transformer (AAViT) that uses modules of adaptive average pooling and attention to replace the module of average value computing. We explore the proposed AAViT for the studies on face anti-spoofing using Replay-Attack database. The experiments show that the AAViT outperforms vision transformer in face anti-spoofing by producing a reduced equal error rate. In addition, we found that the proposed AAViT can perform much better than some commonly used neural networks such as ResNet and some other known systems on the Replay-Attack corpus. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Accepted for Publication in IEEE ICASSP 2024

arXiv:2401.03375 [pdf, other]

doi 10.1109/TITS.2023.3343196

Real-Time Asphalt Pavement Layer Thickness Prediction Using Ground-Penetrating Radar Based on a Modified Extended Common Mid-Point (XCMP) Approach

Authors: Siqi Wang, Zhen Leng, Xin Sui, Weiguang Zhang, Tao Ma, Zehui Zhu

Abstract: The conventional surface reflection method has been widely used to measure the asphalt pavement layer dielectric constant using ground-penetrating radar (GPR). This method may be inaccurate for in-service pavement thickness estimation with dielectric constant variation through the depth, which could be addressed using the extended common mid-point method (XCMP) with air-coupled GPR antennas. Howev… ▽ More The conventional surface reflection method has been widely used to measure the asphalt pavement layer dielectric constant using ground-penetrating radar (GPR). This method may be inaccurate for in-service pavement thickness estimation with dielectric constant variation through the depth, which could be addressed using the extended common mid-point method (XCMP) with air-coupled GPR antennas. However, the factors affecting the XCMP method on thickness prediction accuracy haven't been studied. Manual acquisition of key factors is required, which hinders its real-time applications. This study investigates the affecting factors and develops a modified XCMP method to allow automatic thickness prediction of in-service asphalt pavement with non-uniform dielectric properties through depth. A sensitivity analysis was performed, necessitating the accurate estimation of time of flights (TOFs) from antenna pairs. A modified XCMP method based on edge detection was proposed to allow real-time TOFs estimation, then dielectric constant and thickness predictions. Field tests using a multi-channel GPR system were performed for validation. Both the surface reflection and XCMP setups were conducted. Results show that the modified XCMP method is recommended with a mean prediction error of 1.86%, which is more accurate than the surface reflection method (5.73%). △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: IEEE Transactions on Intelligent Transportation Systems (2024)

arXiv:2401.02592 [pdf, other]

Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery

Authors: Zhen Qin, Michael B. Wakin, Zhihui Zhu

Abstract: In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those fact… ▽ More In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those factors over the Stiefel manifold. We first delve into the TT factorization problem and establish the local linear convergence of RGD. Notably, the rate of convergence only experiences a linear decline as the tensor order increases. We then study the sensing problem that aims to recover a TT format tensor from linear measurements. Assuming the sensing operator satisfies the restricted isometry property (RIP), we show that with a proper initialization, which could be obtained through spectral initialization, RGD also converges to the ground-truth tensor at a linear rate. Furthermore, we expand our analysis to encompass scenarios involving Gaussian noise in the measurements. We prove that RGD can reliably recover the ground truth at a linear rate, with the recovery error exhibiting only polynomial growth in relation to the tensor order. We conduct various experiments to validate our theoretical findings. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2401.00194 [pdf, other]

On the Identifiability from Modulo Measurements under DFT Sensing Matrix

Authors: Qi Zhang, Jiang Zhu, Fengzhong Qu, Zheng Zhu, De Wen Soh

Abstract: Modulo sampling (MS) has been recently introduced to enhance the dynamic range of conventional ADCs by applying a modulo operator before sampling. This paper examines the identifiability of a measurement model where measurements are taken using a discrete Fourier transform (DFT) sensing matrix, followed by a modulo operator (modulo-DFT). Firstly, we derive a necessary and sufficient condition for… ▽ More Modulo sampling (MS) has been recently introduced to enhance the dynamic range of conventional ADCs by applying a modulo operator before sampling. This paper examines the identifiability of a measurement model where measurements are taken using a discrete Fourier transform (DFT) sensing matrix, followed by a modulo operator (modulo-DFT). Firstly, we derive a necessary and sufficient condition for the unique identification of the modulo-DFT sensing model based on the number of measurements and the indices of zero elements in the original signal. Then, we conduct a deeper analysis of three specific cases: when the number of measurements is a power of $2$, a prime number, and twice a prime number. Additionally, we investigate the identifiability of periodic bandlimited (PBL) signals under MS, which can be considered as the modulo-DFT sensing model with additional symmetric and conjugate constraints on the original signal. We also provide a necessary and sufficient condition based solely on the number of samples in one period for the unique identification of the PBL signal under MS, though with an ambiguity in the direct current (DC) component. Furthermore, we show that when the oversampling factor exceeds $3(1+1/P)$, the PBL signal can be uniquely identified with an ambiguity in the DC component, where $P$ is the number of harmonics, including the fundamental component, in the positive frequency part. Finally, we also present a recovery algorithm that estimates the original signal by solving integer linear equations, and we conduct simulations to validate our conclusions. △ Less

Submitted 6 August, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

arXiv:2311.12852 [pdf, ps, other]

Cell-free Terahertz Networks: A Spatial-spectral Approach

Authors: Zesheng Zhu, Lifeng Wang, Xin Wang, Bo Tan, Shi Jin

Abstract: Cell-free network architecture plays a promising role in the terahertz (THz) networks since it provides better link reliability and uniformly good services for all the users compared to the co-located massive MIMO counterpart, and the spatial-spectral THz link has the advantages of lower initial access latency and fast beam operations. To this end, this work studies cell-free spatial-spectral THz… ▽ More Cell-free network architecture plays a promising role in the terahertz (THz) networks since it provides better link reliability and uniformly good services for all the users compared to the co-located massive MIMO counterpart, and the spatial-spectral THz link has the advantages of lower initial access latency and fast beam operations. To this end, this work studies cell-free spatial-spectral THz networks with leaky-wave antennas, to exploit the benefits of leveraging both cell-free and spatial-spectral THz technologies. By addressing the coupling effects between propagation angles and frequencies, we propose novel frequency-dependent THz transmit antenna selection schemes to maximize the transmission rate. Numerical results confirm that the proposed antenna selection schemes can achieve much larger transmission rate than the maximal ratio transmission of using all the transmit antennas with equal subchannel bandwidth allocation in higher THz frequencies. △ Less

Submitted 21 October, 2023; originally announced November 2023.

arXiv:2311.09775 [pdf, other]

MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization

Authors: Zeyu Zhu, Fanrong Li, Gang Li, Zejian Liu, Zitao Mo, Qinghao Hu, Xiaoyao Liang, Jian Cheng

Abstract: Graph Neural Networks (GNNs) are becoming a promising technique in various domains due to their excellent capabilities in modeling non-Euclidean data. Although a spectrum of accelerators has been proposed to accelerate the inference of GNNs, our analysis demonstrates that the latency and energy consumption induced by DRAM access still significantly impedes the improvement of performance and energy… ▽ More Graph Neural Networks (GNNs) are becoming a promising technique in various domains due to their excellent capabilities in modeling non-Euclidean data. Although a spectrum of accelerators has been proposed to accelerate the inference of GNNs, our analysis demonstrates that the latency and energy consumption induced by DRAM access still significantly impedes the improvement of performance and energy efficiency. To address this issue, we propose a Memory-Efficient GNN Accelerator (MEGA) through algorithm and hardware co-design in this work. Specifically, at the algorithm level, through an in-depth analysis of the node property, we observe that the data-independent quantization in previous works is not optimal in terms of accuracy and memory efficiency. This motivates us to propose the Degree-Aware mixed-precision quantization method, in which a proper bitwidth is learned and allocated to a node according to its in-degree to compress GNNs as much as possible while maintaining accuracy. At the hardware level, we employ a heterogeneous architecture design in which the aggregation and combination phases are implemented separately with different dataflows. In order to boost the performance and energy efficiency, we also present an Adaptive-Package format to alleviate the storage overhead caused by the fine-grained bitwidth and diverse sparsity, and a Condense-Edge scheduling method to enhance the data locality and further alleviate the access irregularity induced by the extremely sparse adjacency matrix in the graph. We implement our MEGA accelerator in a 28nm technology node. Extensive experiments demonstrate that MEGA can achieve an average speedup of 38.3x, 7.1x, 4.0x, 3.6x and 47.6x, 7.2x, 5.4x, 4.5x energy savings over four state-of-the-art GNN accelerators, HyGCN, GCNAX, GROW, and SGCN, respectively, while retaining task accuracy. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 15pages, 22 figures. Accepted at HPCA 2024

arXiv:2311.08153 [pdf, other]

When Mining Electric Locomotives Meet Reinforcement Learning

Authors: Ying Li, Zhencai Zhu, Xiaoqiang Li, Chunyu Yang, Hao Lu

Abstract: As the most important auxiliary transportation equipment in coal mines, mining electric locomotives are mostly operated manually at present. However, due to the complex and ever-changing coal mine environment, electric locomotive safety accidents occur frequently these years. A mining electric locomotive control method that can adapt to different complex mining environments is needed. Reinforcemen… ▽ More As the most important auxiliary transportation equipment in coal mines, mining electric locomotives are mostly operated manually at present. However, due to the complex and ever-changing coal mine environment, electric locomotive safety accidents occur frequently these years. A mining electric locomotive control method that can adapt to different complex mining environments is needed. Reinforcement Learning (RL) is concerned with how artificial agents ought to take actions in an environment so as to maximize reward, which can help achieve automatic control of mining electric locomotive. In this paper, we present how to apply RL to the autonomous control of mining electric locomotives. To achieve more precise control, we further propose an improved epsilon-greedy (IEG) algorithm which can better balance the exploration and exploitation. To verify the effectiveness of this method, a co-simulation platform for autonomous control of mining electric locomotives is built which can complete closed-loop simulation of the vehicles. The simulation results show that this method ensures the locomotives following the front vehicle safely and responding promptly in the event of sudden obstacles on the road when the vehicle in complex and uncertain coal mine environments. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.18222 [pdf]

doi 10.1002/ENG2.12815

TBDLNet: a network for classifying multidrug-resistant and drug-sensitive tuberculosis

Authors: Ziquan Zhu, Jing Tao, Shuihua Wang, Xin Zhang, Yudong Zhang

Abstract: This paper proposes applying a novel deep-learning model, TBDLNet, to recognize CT images to classify multidrug-resistant and drug-sensitive tuberculosis automatically. The pre-trained ResNet50 is selected to extract features. Three randomized neural networks are used to alleviate the overfitting problem. The ensemble of three RNNs is applied to boost the robustness via majority voting. The propos… ▽ More This paper proposes applying a novel deep-learning model, TBDLNet, to recognize CT images to classify multidrug-resistant and drug-sensitive tuberculosis automatically. The pre-trained ResNet50 is selected to extract features. Three randomized neural networks are used to alleviate the overfitting problem. The ensemble of three RNNs is applied to boost the robustness via majority voting. The proposed model is evaluated by five-fold cross-validation. Five indexes are selected in this paper, which are accuracy, sensitivity, precision, F1-score, and specificity. The TBDLNet achieves 0.9822 accuracy, 0.9815 specificity, 0.9823 precision, 0.9829 sensitivity, and 0.9826 F1-score, respectively. The TBDLNet is suitable for classifying multidrug-resistant tuberculosis and drug-sensitive tuberculosis. It can detect multidrug-resistant pulmonary tuberculosis as early as possible, which helps to adjust the treatment plan in time and improve the treatment effect. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Journal ref: Engineering Reports (2023)

arXiv:2310.10997 [pdf]

Cooperative Dispatch of Microgrids Community Using Risk-Sensitive Reinforcement Learning with Monotonously Improved Performance

Authors: Ziqing Zhu, Xiang Gao, Siqi Bu, Ka Wing Chan, Bin Zhou, Shiwei Xia

Abstract: The integration of individual microgrids (MGs) into Microgrid Clusters (MGCs) significantly improves the reliability and flexibility of energy supply, through resource sharing and ensuring backup during outages. The dispatch of MGCs is the key challenge to be tackled to ensure their secure and economic operation. Currently, there is a lack of optimization method that can achieve a trade-off among… ▽ More The integration of individual microgrids (MGs) into Microgrid Clusters (MGCs) significantly improves the reliability and flexibility of energy supply, through resource sharing and ensuring backup during outages. The dispatch of MGCs is the key challenge to be tackled to ensure their secure and economic operation. Currently, there is a lack of optimization method that can achieve a trade-off among top-priority requirements of MGCs' dispatch, including fast computation speed, optimality, multiple objectives, and risk mitigation against uncertainty. In this paper, a novel Multi-Objective, Risk-Sensitive, and Online Trust Region Policy Optimization (RS-TRPO) Algorithm is proposed to tackle this problem. First, a dispatch paradigm for autonomous MGs in the MGC is proposed, enabling them sequentially implement their self-dispatch to mitigate potential conflicts. This dispatch paradigm is then formulated as a Markov Game model, which is finally solved by the RS-TRPO algorithm. This online algorithm enables MGs to spontaneously search for the Pareto Frontier considering multiple objectives and risk mitigation. The outstanding computational performance of this algorithm is demonstrated in comparison with mathematical programming methods and heuristic algorithms in a modified IEEE 30-Bus Test System integrated with four autonomous MGs. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.09918 [pdf, other]

Pedestrian Accessible Infrastructure Inventory: Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data for All Pedestrian Types

Authors: Jiahao Xia, Gavin Gong, Jiawei Liu, Zhigang Zhu, Hao Tang

Abstract: In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data including LiDAR data and satellite imagery data. We used an expanded definition of pedestrian infrastructure inventory which goes beyond the traditional transportation elements to include street furni… ▽ More In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data including LiDAR data and satellite imagery data. We used an expanded definition of pedestrian infrastructure inventory which goes beyond the traditional transportation elements to include street furniture objects that are important for accessibility but are often omitted from the traditional definition. Our contributions lie in producing the necessary knowledge to answer the following two questions. First, which data representation can facilitate zero-shot segmentation of infrastructure objects with SAM? Second, how well does the SAM-based method perform on segmenting pedestrian infrastructure objects? Our findings indicate that street view images generated from mobile LiDAR point cloud data, when paired along with satellite imagery data, can work efficiently with SAM to create a scalable pedestrian infrastructure inventory approach with immediate benefits to GIS professionals, city managers, transportation owners, and walkers, especially those with travel-limiting disabilities, such as individuals who are blind, have low vision, or experience mobility disabilities. △ Less

Submitted 27 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

arXiv:2309.06909 [pdf, other]

Intelligent Reflective Surface Assisted Integrated Sensing and Wireless Power Transfer

Authors: Zheng Li, Zhengyu Zhu, Zheng Chu, Yingying Guan, De Mi, Fan Liu, Lie-Liang Yang

Abstract: Wireless sensing and wireless energy are enablers to pave the way for smart transportation and a greener future. In this paper, an intelligent reflecting surface (IRS) assisted integrated sensing and wireless power transfer (ISWPT) system is investigated, where the transmitter in transportation infrastructure networks sends signals to sense multiple targets and simultaneously to multiple energy ha… ▽ More Wireless sensing and wireless energy are enablers to pave the way for smart transportation and a greener future. In this paper, an intelligent reflecting surface (IRS) assisted integrated sensing and wireless power transfer (ISWPT) system is investigated, where the transmitter in transportation infrastructure networks sends signals to sense multiple targets and simultaneously to multiple energy harvesting devices (EHDs) to power them. In light of the performance tradeoff between energy harvesting and sensing, we propose to jointly optimize the system performance via optimizing the beamforming and IRS phase shift. However, the coupling of optimization variables makes the formulated problem non-convex. Thus, an alternative optimization approach is introduced and based on which two algorithms are proposed to solve the problem. Specifically, the first one involves a semi-definite program technique, while the second one features a low-complexity optimization algorithm based on successive convex approximation and majorization minimization. Our simulation results validate the proposed algorithms and demonstrate the advantages of using IRS to assist wireless power transfer in ISWPT systems. △ Less

Submitted 30 November, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: Firstly,the simulation has some error and is needed to checked. Secondly, the authors relationship needs to be corrected between zheng li and zheng chu

ACM Class: H.4.3

arXiv:2309.05446 [pdf, other]

A Localization-to-Segmentation Framework for Automatic Tumor Segmentation in Whole-Body PET/CT Images

Authors: Linghan Cai, Jianhao Huang, Zihang Zhu, Jinpeng Lu, Yongbing Zhang

Abstract: Fluorodeoxyglucose (FDG) positron emission tomography (PET) combined with computed tomography (CT) is considered the primary solution for detecting some cancers, such as lung cancer and melanoma. Automatic segmentation of tumors in PET/CT images can help reduce doctors' workload, thereby improving diagnostic quality. However, precise tumor segmentation is challenging due to the small size of many… ▽ More Fluorodeoxyglucose (FDG) positron emission tomography (PET) combined with computed tomography (CT) is considered the primary solution for detecting some cancers, such as lung cancer and melanoma. Automatic segmentation of tumors in PET/CT images can help reduce doctors' workload, thereby improving diagnostic quality. However, precise tumor segmentation is challenging due to the small size of many tumors and the similarity of high-uptake normal areas to the tumor regions. To address these issues, this paper proposes a localization-to-segmentation framework (L2SNet) for precise tumor segmentation. L2SNet first localizes the possible lesions in the lesion localization phase and then uses the location cues to shape the segmentation results in the lesion segmentation phase. To further improve the segmentation performance of L2SNet, we design an adaptive threshold scheme that takes the segmentation results of the two phases into consideration. The experiments with the MICCAI 2023 Automated Lesion Segmentation in Whole-Body FDG-PET/CT challenge dataset show that our method achieved a competitive result and was ranked in the top 7 methods on the preliminary test set. Our work is available at: https://github.com/MedCAI/L2SNet. △ Less

Submitted 14 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 7 pages,3 figures

arXiv:2308.06064 [pdf, other]

Joint Beamforming Optimization for Active STAR-RIS Assisted ISAC systems

Authors: Shuang Zhang, Wanming Hao, Gangcan Sun, Chongwen Huang, Zhengyu Zhu, Xingwang Li, Chau Yuen

Abstract: In this paper, we investigate an active simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted integrated sensing and communications (ISAC) system, in which a dual-function base station (DFBS) equipped with multiple antennas provides communication services for multiple users with the assistance of an active STARRIS and performs target sensing simultaneous… ▽ More In this paper, we investigate an active simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted integrated sensing and communications (ISAC) system, in which a dual-function base station (DFBS) equipped with multiple antennas provides communication services for multiple users with the assistance of an active STARRIS and performs target sensing simultaneously. Through optimizing both the DFBS and STAR-RIS beamforming jointly under different work modes, our purpose is to achieve the maximized communication sum rate, subject to the minimum radar signal-to-noise ratio (SNR) constraint, active STAR-RIS hardware constraints, and total power constraint of DFBS and active STAR-RIS. To solve the non-convex optimization problem formulated, an efficient alternating optimization algorithm is proposed. Specifically, the fractional programming scheme is first leveraged to turn the original problem into a structure with more tractable, and subsequently the transformed problem is decomposed into multiple sub-problems. Next, we develop a derivation method to obtain the closed expression of the radar receiving beamforming, and then the DFBS transmit beamforming is optimized under the radar SNR requirement and total power constraint. After that, the active STAR-RIS reflection and transmission beamforming are optimized by majorization minimiation, complex circle manifold and convex optimization techniques. Finally, the proposed schemes are conducted through numerical simulations to show their benefits and efficiency. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2307.14335 [pdf, other]

WavJourney: Compositional Audio Creation with Large Language Models

Authors: Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

Abstract: Despite breakthroughs in audio generation models, their capabilities are often confined to domain-specific conditions such as speech transcriptions and audio captions. However, real-world audio creation aims to generate harmonious audio containing various elements such as speech, music, and sound effects with controllable conditions, which is challenging to address using existing audio generation… ▽ More Despite breakthroughs in audio generation models, their capabilities are often confined to domain-specific conditions such as speech transcriptions and audio captions. However, real-world audio creation aims to generate harmonious audio containing various elements such as speech, music, and sound effects with controllable conditions, which is challenging to address using existing audio generation systems. We present WavJourney, a novel framework that leverages Large Language Models (LLMs) to connect various audio models for audio creation. WavJourney allows users to create storytelling audio content with diverse audio elements simply from textual descriptions. Specifically, given a text instruction, WavJourney first prompts LLMs to generate an audio script that serves as a structured semantic representation of audio elements. The audio script is then converted into a computer program, where each line of the program calls a task-specific audio generation model or computational operation function. The computer program is then executed to obtain a compositional and interpretable solution for audio creation. Experimental results suggest that WavJourney is capable of synthesizing realistic audio aligned with textually-described semantic, spatial and temporal conditions, achieving state-of-the-art results on text-to-audio generation benchmarks. Additionally, we introduce a new multi-genre story benchmark. Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text. We further demonstrate that WavJourney can facilitate human-machine co-creation in multi-round dialogues. To foster future research, the code and synthesized audio are available at: https://audio-agi.github.io/WavJourney_demopage/. △ Less

Submitted 26 November, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: GitHub: https://github.com/Audio-AGI/WavJourney

arXiv:2307.12262 [pdf, other]

A meta learning scheme for fast accent domain expansion in Mandarin speech recognition

Authors: Ziwei Zhu, Changhao Shan, Bihong Zhang, Jian Yu

Abstract: Spoken languages show significant variation across mandarin and accent. Despite the high performance of mandarin automatic speech recognition (ASR), accent ASR is still a challenge task. In this paper, we introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition, which expands the field of accents without deteriorating the performance of mandarin ASR. Meta-… ▽ More Spoken languages show significant variation across mandarin and accent. Despite the high performance of mandarin automatic speech recognition (ASR), accent ASR is still a challenge task. In this paper, we introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition, which expands the field of accents without deteriorating the performance of mandarin ASR. Meta-learning or learn-to-learn can learn general relation in multi domains not only for over-fitting a specific domain. So we select meta-learning in the domain expansion task. This more essential learning will cause improved performance on accent domain extension tasks. We combine the methods of meta learning and freeze of model parameters, which makes the recognition performance more stable in different cases and the training faster about 20%. Our approach significantly outperforms other methods about 3% relatively in the accent domain expansion task. Compared to the baseline model, it improves relatively 37% under the condition that the mandarin test set remains unchanged. In addition, it also proved this method to be effective on a large amount of data with a relative performance improvement of 4% on the accent test set. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2307.01486 [pdf, other]

H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Authors: Jun Shi, Hongyu Kan, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Liang Qiao, Zhaohui Wang, Hong An, Xudong Xue

Abstract: Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the… ▽ More Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 11 pages, 2 figures. This paper has been accepted by Medical Image Computing and Computer-Assisted Intervention(MICCAI) 2023

arXiv:2306.12099 [pdf, ps, other]

Anti-jamming Method for SAR Using Joint Waveform Modulation and Azimuth Mismatched Filtering

Authors: Zhuan Sun, Zhanyu Zhu

Abstract: High-fidelity deception jamming can seriously mislead Synthetic Aperture Radar (SAR) image interpretation and target detection, which is difficult to identify or eliminate through traditional anti-jamming methods. Based on the Range-Doppler Algorithm (RDA), an anti-jamming method for SAR by using joint waveform modulation and azimuth mismatched filtering is proposed in this paper. The signal model… ▽ More High-fidelity deception jamming can seriously mislead Synthetic Aperture Radar (SAR) image interpretation and target detection, which is difficult to identify or eliminate through traditional anti-jamming methods. Based on the Range-Doppler Algorithm (RDA), an anti-jamming method for SAR by using joint waveform modulation and azimuth mismatched filtering is proposed in this paper. The signal model of SAR echo after pulse compression with deception jamming is derived from the radar detection and jamming characteristics. A multi-objective cost function is introduced to suppress the jamming level and keep the real target information, parameterized with the waveform initial phase and the azimuth mismatched filter coefficients, which is optimized using the second-order Taylor expansion approximation and the alternating direction multiplier method (ADMM). The performance of this proposed method is evaluated through the convergence analysis and anti-jamming experiments on the point target and the distributed target scenarios. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.09432 [pdf, other]

Quantum State Tomography for Matrix Product Density Operators

Authors: Zhen Qin, Casey Jameson, Zhexuan Gong, Michael B. Wakin, Zhihui Zhu

Abstract: The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the mos… ▽ More The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the most optimal measurement settings. Fortunately, many physical quantum states, such as states generated by noisy, intermediate-scale quantum computers, are usually structured. In one dimension, such states are expected to be well approximated by matrix product operators (MPOs) with a finite matrix/bond dimension independent of the number of qubits, therefore enabling efficient state representation. Nevertheless, it is still unclear whether efficient QST can be performed for these states in general. In this paper, we attempt to bridge this gap and establish theoretical guarantees for the stable recovery of MPOs using tools from compressive sensing and the theory of empirical processes. We begin by studying two types of random measurement settings: Gaussian measurements and Haar random rank-one Positive Operator Valued Measures (POVMs). We show that the information contained in an MPO with a finite bond dimension can be preserved using a number of random measurements that depends only \emph{linearly} on the number of qubits, assuming no statistical error of the measurements. We then study MPO-based QST with physical quantum measurements through Haar random rank-one POVMs that can be implemented on quantum computers. We prove that only a \emph{polynomial} number of state copies in the number of qubits is required to guarantee bounded recovery error of an MPO state. △ Less

Submitted 18 February, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.04554 [pdf]

Integrated Photonic Encoder for Terapixel Image Processing

Authors: Xiao Wang, Brandon Redding, Nicholas Karl, Christopher Long, Zheyuan Zhu, Shuo Pang, David Brady, Raktim Sarma

Abstract: Modern lens designs are capable of resolving >10 gigapixels, while advances in camera frame-rate and hyperspectral imaging have made Terapixel/s data acquisition a real possibility. The main bottlenecks preventing such high data-rate systems are power consumption and data storage. In this work, we show that analog photonic encoders could address this challenge, enabling high-speed image compressio… ▽ More Modern lens designs are capable of resolving >10 gigapixels, while advances in camera frame-rate and hyperspectral imaging have made Terapixel/s data acquisition a real possibility. The main bottlenecks preventing such high data-rate systems are power consumption and data storage. In this work, we show that analog photonic encoders could address this challenge, enabling high-speed image compression using orders-of-magnitude lower power than digital electronics. Our approach relies on a silicon-photonics front-end to compress raw image data, foregoing energy-intensive image conditioning and reducing data storage requirements. The compression scheme uses a passive disordered photonic structure to perform kernel-type random projections of the raw image data with minimal power consumption and low latency. A back-end neural network can then reconstruct the original images with structural similarity exceeding 90%. This scheme has the potential to process Terapixel/s data streams using less than 100 fJ/pixel, providing a path to ultra-high-resolution data and image acquisition systems. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2305.10925 [pdf, other]

Unsupervised Hyperspectral Pansharpening via Low-rank Diffusion Model

Authors: Xiangyu Rui, Xiangyong Cao, Li Pang, Zeyu Zhu, Zongsheng Yue, Deyu Meng

Abstract: Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image. Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features, and deep learning-based HS pansharpening methods usually require a la… ▽ More Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image. Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features, and deep learning-based HS pansharpening methods usually require a large number of paired training data and suffer from poor generalization ability. To address these issues, in this work, we propose a low-rank diffusion model for hyperspectral pansharpening by simultaneously leveraging the power of the pre-trained deep diffusion model and better generalization ability of Bayesian methods. Specifically, we assume that the HRHS image can be recovered from the product of two low-rank tensors, i.e., the base tensor and the coefficient matrix. The base tensor lies on the image field and has a low spectral dimension. Thus, we can conveniently utilize a pre-trained remote sensing diffusion model to capture its image structures. Additionally, we derive a simple yet quite effective way to pre-estimate the coefficient matrix from the observed LRHS image, which preserves the spectral information of the HRHS. Experimental results demonstrate that the proposed method performs better than some popular traditional approaches and gains better generalization ability than some DL-based methods. The code is released in https://github.com/xyrui/PLRDiff. △ Less

Submitted 19 November, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.08527 [pdf, other]

Sum Secrecy Rate Maximization for IRS-aided Multi-Cluster MIMO-NOMA Terahertz Systems

Authors: Jinlei Xu, Zhengyu Zhu, Zheng Chu, Hehao Niu, Pei Xiao, Inkyu Lee

Abstract: Intelligent reflecting surface (IRS) is a promising technique to extend the network coverage and improve spectral efficiency. This paper investigates an IRS-assisted terahertz (THz) multiple-input multiple-output (MIMO)-nonorthogonal multiple access (NOMA) system based on hybrid precoding with the presence of eavesdropper. Two types of sparse RF chain antenna structures are adopted, i.e., sub-conn… ▽ More Intelligent reflecting surface (IRS) is a promising technique to extend the network coverage and improve spectral efficiency. This paper investigates an IRS-assisted terahertz (THz) multiple-input multiple-output (MIMO)-nonorthogonal multiple access (NOMA) system based on hybrid precoding with the presence of eavesdropper. Two types of sparse RF chain antenna structures are adopted, i.e., sub-connected structure and fully connected structure. First, cluster heads are selected for each beam, and analog precoding based on discrete phase is designed. Then, users are clustered based on channel correlation, and NOMA technology is employed to serve the users. In addition, a low-complexity forced-zero method is utilized to design digital precoding in order to eliminate inter-cluster interference. On this basis, we propose a secure transmission scheme to maximize the sum secrecy rate by jointly optimizing the power allocation and phase shifts of IRS subject to the total transmit power budget, minimal achievable rate requirement of each user, and IRS reflection coefficients. Due to multiple coupled variables, the formulated problem leads to a non-convex issue. We apply the Taylor series expansion and semidefinite programming to convert the original non-convex problem into a convex one. Then, an alternating optimization algorithm is developed to obtain a feasible solution of the original problem. Simulation results verify the convergence of the proposed algorithm, and deploying IRS can bring significant beamforming gains to suppress the eavesdropping. △ Less

Submitted 11 June, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 11 pages, 8 figure; references added

arXiv:2305.02485

How to Use Reinforcement Learning to Facilitate Future Electricity Market Design? Part 1: A Paradigmatic Theory

Authors: Ziqing Zhu, Siqi Bu, Ka Wing Chan, Bin Zhou, Shiwei Xia

Abstract: In face of the pressing need of decarbonization in the power sector, the re-design of electricity market is necessary as a Marco-level approach to accommodate the high penetration of renewable generations, and to achieve power system operation security, economic efficiency, and environmental friendliness. However, existing market design methodologies suffer from the lack of coordination among ener… ▽ More In face of the pressing need of decarbonization in the power sector, the re-design of electricity market is necessary as a Marco-level approach to accommodate the high penetration of renewable generations, and to achieve power system operation security, economic efficiency, and environmental friendliness. However, existing market design methodologies suffer from the lack of coordination among energy spot market (ESM), ancillary service market (ASM) and financial market (FM), i.e., the "joint market", and the lack of reliable simulation-based verification. To tackle these deficiencies, this two-part paper develops a paradigmatic theory and detailed methods of the joint market design using reinforcement-learning (RL)-based simulation. In Part 1, the theory and framework of this novel market design philosophy are proposed. First, the controversial market design options while designing the joint market are summarized as the targeted research questions. Second, the Markov game model is developed to describe the bidding game in the joint market, incorporating the market design options to be determined. Third, a framework of deploying multiple types of RL algorithms to simulate the market model is developed. Finally, several market operation performance indicators are proposed to validate the market design based on the simulation results. △ Less

Submitted 11 May, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: It is old version with mistakes

arXiv:2304.06042 [pdf]

A physical neural network training approach toward multi-plane light conversion design

Authors: Zheyuan Zhu, Joe H. Doerr, Guifang Li, Shuo Pang

Abstract: Multi-plane light converter (MPLC) designs supporting hundreds of modes are attractive in high-throughput optical communications. These photonic structures typically comprise >10 phase masks in free space, with millions of independent design parameters. Conventional MPLC design using wavefront matching updates one mask at a time while fixing the rest. Here we construct a physical neural network (P… ▽ More Multi-plane light converter (MPLC) designs supporting hundreds of modes are attractive in high-throughput optical communications. These photonic structures typically comprise >10 phase masks in free space, with millions of independent design parameters. Conventional MPLC design using wavefront matching updates one mask at a time while fixing the rest. Here we construct a physical neural network (PNN) to model the light propagation and phase modulation in MPLC, providing access to the entire parameter set for optimization, including not only profiles of the phase masks and the distances between them. PNN training supports flexible optimization sequences and is a superset of existing MPLC design methods. In addition, our method allows tuning of hyperparameters of PNN training such as learning rate and batch size. Because PNN-based MPLC is found to be insensitive to the number of input and target modes in each training step, we have demonstrated a high-order MPLC design (45 modes) using mini batches that fit into the available computing resources. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: Draft for submission to Optics Express

Showing 1–50 of 129 results for author: Zhu, Z