Search | arXiv e-print repository

Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers

Authors: Qian Wang, Zhaoyang Bu, Jiaxuan Mao, Wenyu Zhu, Jingya Zhao, Wei Du, Guochao Shi, Min Zhou, Si Chen, Jieming Qu

Abstract: Recent advancements in deep learning techniques have sparked performance boosts in various real-world applications including disease diagnosis based on multi-modal medical data. Cough sound data-based respiratory disease (e.g., COVID-19 and Chronic Obstructive Pulmonary Disease) diagnosis has also attracted much attention. However, existing works usually utilise traditional machine learning or dee… ▽ More Recent advancements in deep learning techniques have sparked performance boosts in various real-world applications including disease diagnosis based on multi-modal medical data. Cough sound data-based respiratory disease (e.g., COVID-19 and Chronic Obstructive Pulmonary Disease) diagnosis has also attracted much attention. However, existing works usually utilise traditional machine learning or deep models of moderate scales. On the other hand, the developed approaches are trained and evaluated on small-scale data due to the difficulty of curating and annotating clinical data on scale. To address these issues in prior works, we create a unified framework to evaluate various deep models from lightweight Convolutional Neural Networks (e.g., ResNet18) to modern vision transformers and compare their performance in respiratory disease classification. Based on the observations from such an extensive empirical study, we propose a novel approach to cough-based disease classification based on both self-supervised and supervised learning on a large-scale cough data set. Experimental results demonstrate our proposed approach outperforms prior arts consistently on two benchmark datasets for COVID-19 diagnosis and a proprietary dataset for COPD/non-COPD classification with an AUROC of 92.5%. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2407.16634 [pdf, other]

Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifically, we introduce a pipeline, TAILOR, that builds a knowledge-driven generative model to produce tailored synthetic data. The generative model, using 3,749 lesions as source data, can generate millions of breast-US images, especially for error-prone rare cases. The generated data can be further used to build a diagnostic model for accurate and interpretable diagnoses. In the prospective external evaluation, our diagnostic model outperforms the average performance of nine radiologists by 33.5% in specificity with the same sensitivity, improving their performance by providing predictions with an interpretable decision-making process. Moreover, on ductal carcinoma in situ (DCIS), our diagnostic model outperforms all radiologists by a large margin, with only 34 DCIS lesions in the source data. We believe that TAILOR can potentially be extended to various diseases and imaging modalities. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2404.16484 [pdf, other]

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

arXiv:2404.12804 [pdf, other]

Linearly-evolved Transformer for Pan-sharpening

Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of model parameters and FLOPs, thus preventing its application over low-resource satellites.To address this challenge between favorable performance and expensive computation, we tailor an efficient linearly-evolved transformer variant and employ it to construct a lightweight pan-sharpening framework. In detail, we deepen into the popular cascaded transformer modeling with cutting-edge methods and develop the alternative 1-order linearly-evolved transformer variant with the 1-dimensional linear convolution chain to achieve the same function. In this way, our proposed method is capable of benefiting the cascaded modeling rule while achieving favorable performance in the efficient manner. Extensive experiments over multiple satellite datasets suggest that our proposed method achieves competitive performance against other state-of-the-art with fewer computational resources. Further, the consistently favorable performance has been verified over the hyper-spectral image fusion task. Our main focus is to provide an alternative global modeling framework with an efficient structure. The code will be publicly available. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2403.15483 [pdf]

Rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model

Authors: Maoxuan Zhou, Wei Kang, Kun He

Abstract: In order to solve the problem that current convolutional neural networks can not capture the correlation features between the time domain signals of rolling bearings effectively, and the model accuracy is limited by the number and quality of samples, a rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model is proposed. Firstly… ▽ More In order to solve the problem that current convolutional neural networks can not capture the correlation features between the time domain signals of rolling bearings effectively, and the model accuracy is limited by the number and quality of samples, a rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model is proposed. Firstly, Gram angular field coding technique is used to encode the time domain signal of the rolling bearing and generate the feature map to retain the complete information of the vibration signal. Then, the re-sulting data is divided into a training set, a validation set, and a test set. Among them, the training set is input into the gradient penalty Wasserstein distance generation adversarial network to complete the training, and a new sample with similar features to the training sample is obtained, and then the original training set is expanded. Next, multi-scale convolution is used to extract the fault features of the extended training set, and the feature graph is normalized by example to overcome the influence of the difference in feature distribution. Finally, the attention mechanism is applied to the adaptive weighting of normalized features and the extraction of deep features, and the fault diagnosis is completed by the softmax classifier. Compared with ResNet method, the experimental results show that the proposed method has better generalization performance and anti-noise performance. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.00987 [pdf, other]

Composite Distributed Learning and Synchronization of Nonlinear Multi-Agent Systems with Complete Uncertain Dynamics

Authors: Emadodin Jandaghi, Dalton L. Stein, Adam Hoburg, Paolo Stegagno, Mingxi Zhou, Chengzhi Yuan

Abstract: This paper addresses the problem of composite synchronization and learning control in a network of multi-agent robotic manipulator systems with heterogeneous nonlinear uncertainties under a leader-follower framework. A novel two-layer distributed adaptive learning control strategy is introduced, comprising a first-layer distributed cooperative estimator and a second-layer decentralized determinist… ▽ More This paper addresses the problem of composite synchronization and learning control in a network of multi-agent robotic manipulator systems with heterogeneous nonlinear uncertainties under a leader-follower framework. A novel two-layer distributed adaptive learning control strategy is introduced, comprising a first-layer distributed cooperative estimator and a second-layer decentralized deterministic learning controller. The first layer is to facilitate each robotic agent's estimation of the leader's information. The second layer is responsible for both controlling individual robot agents to track desired reference trajectories and accurately identifying/learning their nonlinear uncertain dynamics. The proposed distributed learning control scheme represents an advancement in the existing literature due to its ability to manage robotic agents with completely uncertain dynamics including uncertain mass matrices. This allows the robotic control to be environment-independent which can be used in various settings, from underwater to space where identifying system dynamics parameters is challenging. The stability and parameter convergence of the closed-loop system are rigorously analyzed using the Lyapunov method. Numerical simulations validate the effectiveness of the proposed scheme. △ Less

Submitted 9 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2401.00160 [pdf, other]

Acceleration Estimation of Signal Propagation Path Length Changes for Wireless Sensing

Authors: Jiacheng Wang, Hongyang Du, Dusit Niyato, Mu Zhou, Jiawen Kang, H. Vincent Poor

Abstract: As indoor applications grow in diversity, wireless sensing, vital in areas like localization and activity recognition, is attracting renewed interest. Indoor wireless sensing relies on signal processing, particularly channel state information (CSI) based signal parameter estimation. Nonetheless, regarding reflected signals induced by dynamic human targets, no satisfactory algorithm yet exists for… ▽ More As indoor applications grow in diversity, wireless sensing, vital in areas like localization and activity recognition, is attracting renewed interest. Indoor wireless sensing relies on signal processing, particularly channel state information (CSI) based signal parameter estimation. Nonetheless, regarding reflected signals induced by dynamic human targets, no satisfactory algorithm yet exists for estimating the acceleration of dynamic path length change (DPLC), which is crucial for various sensing tasks in this context. Hence, this paper proposes DP-AcE, a CSI-based DPLC acceleration estimation algorithm. We first model the relationship between the phase difference of adjacent CSI measurements and the DPLC's acceleration. Unlike existing works assuming constant velocity, DP-AcE considers both velocity and acceleration, yielding a more accurate and objective representation. Using this relationship, an algorithm combining scaling with Fourier transform is proposed to realize acceleration estimation. We evaluate DP-AcE via the acceleration estimation and acceleration-based fall detection with the collected CSI. Experimental results reveal that, using distance as the metric, DP-AcE achieves a median acceleration estimation percentage error of 4.38%. Furthermore, in multi-target scenarios, the fall detection achieves an average true positive rate of 89.56% and a false positive rate of 11.78%, demonstrating its importance in enhancing indoor wireless sensing capabilities. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.09576 [pdf, other]

SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, Jin Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results in many medical image segmentation tasks. However, for NPC OARs and GTVs segmentation, few public datasets are available for model development and evaluation. To alleviate this problem, the SegRap2023 challenge was organized in conjunction with MICCAI2023 and presented a large-scale benchmark for OAR and GTV segmentation with 400 Computed Tomography (CT) scans from 200 NPC patients, each with a pair of pre-aligned non-contrast and contrast-enhanced CT scans. The challenge's goal was to segment 45 OARs and 2 GTVs from the paired CT scans. In this paper, we detail the challenge and analyze the solutions of all participants. The average Dice similarity coefficient scores for all submissions ranged from 76.68\% to 86.70\%, and 70.42\% to 73.44\% for OARs and GTVs, respectively. We conclude that the segmentation of large-size OARs is well-addressed, and more efforts are needed for GTVs and small-size or thin-structure OARs. The benchmark will remain publicly available here: https://segrap2023.grand-challenge.org △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

arXiv:2312.04767 [pdf, other]

Finite Horizon Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems

Authors: Mi Zhou

Abstract: In this article, the deep deterministic policy gradient (DDPG) method is used to learn an optimal control policy of a multi-region state-dependent switched system. We observe good performance of this model-free method and explain it in a rigorous mathematical language. The performance of the learning-based methods is compared with the optimal solution given by vanilla differential dynamic programm… ▽ More In this article, the deep deterministic policy gradient (DDPG) method is used to learn an optimal control policy of a multi-region state-dependent switched system. We observe good performance of this model-free method and explain it in a rigorous mathematical language. The performance of the learning-based methods is compared with the optimal solution given by vanilla differential dynamic programming (DDP) in three customized environments. △ Less

Submitted 14 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.00951 [pdf, other]

AV4EV: Open-Source Modular Autonomous Electric Vehicle Platform for Making Mobility Research Accessible

Authors: Zhijie Qiao, Mingyan Zhou, Zhijun Zhuang, Tejas Agarwal, Felix Jahncke, Po-Jen Wang, Jason Friedman, Hongyi Lai, Divyanshu Sahu, Tomáš Nagy, Martin Endler, Jason Schlessman, Rahul Mangharam

Abstract: When academic researchers develop and validate autonomous driving algorithms, there is a challenge in balancing high-performance capabilities with the cost and complexity of the vehicle platform. Much of today's research on autonomous vehicles (AV) is limited to experimentation on expensive commercial vehicles that require large skilled teams to retrofit the vehicles and test them in dedicated fac… ▽ More When academic researchers develop and validate autonomous driving algorithms, there is a challenge in balancing high-performance capabilities with the cost and complexity of the vehicle platform. Much of today's research on autonomous vehicles (AV) is limited to experimentation on expensive commercial vehicles that require large skilled teams to retrofit the vehicles and test them in dedicated facilities. On the other hand, 1/10th-1/16th scaled-down vehicle platforms are more affordable but have limited similitude in performance and drivability. To address this issue, we present the design of a one-third-scale autonomous electric go-kart platform with open-source mechatronics design along with fully functional autonomous driving software. The platform's multi-modal driving system is capable of manual, autonomous, and teleoperation driving modes. It also features a flexible sensing suite for the algorithm deployment across perception, localization, planning, and control. This development serves as a bridge between full-scale vehicles and reduced-scale cars while accelerating cost-effective algorithmic advancements. Our experimental results demonstrate the AV4EV platform's capabilities and ease of use for developing new AV algorithms. All materials are available at AV4EV.org to stimulate collaborative efforts within the AV and electric vehicle (EV) communities. △ Less

Submitted 12 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: 6 pages, 5 figures

arXiv:2311.03557 [pdf, other]

Spatio-Temporal Similarity Measure based Multi-Task Learning for Predicting Alzheimer's Disease Progression using MRI Data

Authors: Xulong Wang, Yu Zhang, Menghui Zhou, Tong Liu, Jun Qi, Po Yang

Abstract: Identifying and utilising various biomarkers for tracking Alzheimer's disease (AD) progression have received many recent attentions and enable helping clinicians make the prompt decisions. Traditional progression models focus on extracting morphological biomarkers in regions of interest (ROIs) from MRI/PET images, such as regional average cortical thickness and regional volume. They are effective… ▽ More Identifying and utilising various biomarkers for tracking Alzheimer's disease (AD) progression have received many recent attentions and enable helping clinicians make the prompt decisions. Traditional progression models focus on extracting morphological biomarkers in regions of interest (ROIs) from MRI/PET images, such as regional average cortical thickness and regional volume. They are effective but ignore the relationships between brain ROIs over time, which would lead to synergistic deterioration. For exploring the synergistic deteriorating relationship between these biomarkers, in this paper, we propose a novel spatio-temporal similarity measure based multi-task learning approach for effectively predicting AD progression and sensitively capturing the critical relationships between biomarkers. Specifically, we firstly define a temporal measure for estimating the magnitude and velocity of biomarker change over time, which indicate a changing trend(temporal). Converting this trend into the vector, we then compare this variability between biomarkers in a unified vector space(spatial). The experimental results show that compared with directly ROI based learning, our proposed method is more effective in predicting disease progression. Our method also enables performing longitudinal stability selection to identify the changing relationships between biomarkers, which play a key role in disease progression. We prove that the synergistic deteriorating biomarkers between cortical volumes or surface areas have a significant effect on the cognitive prediction. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02691 [pdf, ps, other]

Age of Information Analysis for CR-NOMA Aided Uplink Systems with Randomly Arrived Packets

Authors: Yanshi Sun, Yanglin Ye, Zhiguo Ding, Momiao Zhou, Lei Liu

Abstract: This paper studies the application of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) to reduce age of information (AoI) for uplink transmission. In particular, a time division multiple access (TDMA) based legacy network is considered, where each user is allocated with a dedicated time slot to transmit its status update information. The CR-NOMA is implemented as an add-on to the… ▽ More This paper studies the application of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) to reduce age of information (AoI) for uplink transmission. In particular, a time division multiple access (TDMA) based legacy network is considered, where each user is allocated with a dedicated time slot to transmit its status update information. The CR-NOMA is implemented as an add-on to the TDMA legacy network, which enables each user to have more opportunities to transmit by sharing other user's time slots. A rigorous analytical framework is developed to obtain the expressions for AoIs achieved by CR-NOMA with and without re-transmission, by taking the randomness of the status update generating process into consideration. Numerical results are presented to verify the accuracy of the developed analysis. It is shown that the AoI can be significantly reduced by applying CR-NOMA compared to TDMA. Moreover, the use of re-transmission is helpful to reduce AoI, especially when the status arrival rate is low. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2310.04722 [pdf, other]

A Holistic Evaluation of Piano Sound Quality

Authors: Monan Zhou, Shangda Wu, Shaohua Ji, Zijin Li, Wei Li

Abstract: This paper aims to develop a holistic evaluation method for piano sound quality to assist in purchasing decisions. Unlike previous studies that focused on the effect of piano performance techniques on sound quality, this study evaluates the inherent sound quality of different pianos. To derive quality evaluation systems, the study uses subjective questionnaires based on a piano sound quality datas… ▽ More This paper aims to develop a holistic evaluation method for piano sound quality to assist in purchasing decisions. Unlike previous studies that focused on the effect of piano performance techniques on sound quality, this study evaluates the inherent sound quality of different pianos. To derive quality evaluation systems, the study uses subjective questionnaires based on a piano sound quality dataset. The method selects the optimal piano classification models by comparing the fine-tuning results of different pre-training models of Convolutional Neural Networks (CNN). To improve the interpretability of the models, the study applies Equivalent Rectangular Bandwidth (ERB) analysis. The results reveal that musically trained individuals are better able to distinguish between the sound quality differences of different pianos. The best fine-tuned CNN pre-trained backbone achieves a high accuracy of 98.3\% as the piano classifier. However, the dataset is limited, and the audio is sliced to increase its quantity, resulting in a lack of diversity and balance, so we use focal loss to reduce the impact of data imbalance. To optimize the method, the dataset will be expanded, or few-shot learning techniques will be employed in future research. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.17315 [pdf, other]

Data-Driven Newton Raphson Controller Based on Koopman Operator Theory

Authors: Mi Zhou

Abstract: Newton-Raphson controller is a powerful prediction-based variable gain integral controller. Basically, the classical model-based Newton-Raphson controller requires two elements: the prediction of the system output and the derivative of the predicted output with respect to the control input. In real applications, the model may not be known and it is infeasible to predict the system sometime ahead a… ▽ More Newton-Raphson controller is a powerful prediction-based variable gain integral controller. Basically, the classical model-based Newton-Raphson controller requires two elements: the prediction of the system output and the derivative of the predicted output with respect to the control input. In real applications, the model may not be known and it is infeasible to predict the system sometime ahead and calculate the derivative by finite difference method as done in simulation. To solve these problems, in this work, we utilize the Koopman operator framework to reconstruct a linear model of the original nonlinear dynamical system and then utilize the output of the new linear system as the predictor of the Newton-Raphson controller. This method is only based on collected data within some time instant thus more practical. Three examples related to highly nonlinear systems are provided to verify the effectiveness of our proposed method. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.16834 [pdf, other]

Energy Optimal Control of a Harmonic Oscillator with a State Inequality Constraint

Authors: Mi Zhou, Erik I Verriest, Chaouki Abdallah

Abstract: In this article, the optimal control problem for a harmonic oscillator with an inequality constraint is considered. The applied energy of the oscillator during a fixed final time period is used as the performance criterion. The analytical solution with both small and large terminal time is found for a special case when the undriven oscillator system is initially at rest. For other initial states o… ▽ More In this article, the optimal control problem for a harmonic oscillator with an inequality constraint is considered. The applied energy of the oscillator during a fixed final time period is used as the performance criterion. The analytical solution with both small and large terminal time is found for a special case when the undriven oscillator system is initially at rest. For other initial states of the Harmonic oscillator, the optimal solution is found to have three modes: wait-move, move-wait, and move-wait-move given a longer terminal time. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.13259 [pdf, other]

WikiMT++ Dataset Card

Authors: Monan Zhou, Shangda Wu, Yuan Wang, Wei Li

Abstract: WikiMT++ is an expanded and refined version of WikiMusicText (WikiMT), featuring 1010 curated lead sheets in ABC notation. To expand application scenarios of WikiMT, we add both objective (album, lyrics, video) and subjective emotion (12 emotion adjectives) and emo\_4q (Russell 4Q) attributes, enhancing its usability for music information retrieval, conditional music generation, automatic composit… ▽ More WikiMT++ is an expanded and refined version of WikiMusicText (WikiMT), featuring 1010 curated lead sheets in ABC notation. To expand application scenarios of WikiMT, we add both objective (album, lyrics, video) and subjective emotion (12 emotion adjectives) and emo\_4q (Russell 4Q) attributes, enhancing its usability for music information retrieval, conditional music generation, automatic composition, and emotion classification, etc. Additionally, CLaMP is implemented to correct the attributes inherited from WikiMT to reduce errors introduced during original data collection and enhance the accuracy and completeness of our dataset. △ Less

Submitted 23 September, 2023; originally announced September 2023.

arXiv:2309.01958 [pdf, other]

Empowering Low-Light Image Enhancer through Customized Learnable Priors

Authors: Naishan Zheng, Man Zhou, Yanmeng Dong, Xiangyu Rui, Jie Huang, Chongyi Li, Feng Zhao

Abstract: Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issu… ▽ More Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issues, they rely on proximal operator networks that deliver ambiguous and implicit priors. In this work, we propose a paradigm for low-light image enhancement that explores the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm. Motivated by the powerful feature representation capability of Masked Autoencoder (MAE), we customize MAE-based illumination and noise priors and redevelop them from two perspectives: 1) \textbf{structure flow}: we train the MAE from a normal-light image to its illumination properties and then embed it into the proximal operator design of the unfolding architecture; and m2) \textbf{optimization flow}: we train MAE from a normal-light image to its gradient representation and then employ it as a regularization term to constrain noise in the model output. These designs improve the interpretability and representation capability of the model.Extensive experiments on multiple low-light image enhancement datasets demonstrate the superiority of our proposed paradigm over state-of-the-art methods. Code is available at https://github.com/zheng980629/CUE. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted by ICCV 2023

arXiv:2308.16083 [pdf, other]

Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion

Authors: Man Zhou, Jie Huang, Naishan Zheng, Chongyi Li

Abstract: The success of deep neural networks for pan-sharpening is commonly in a form of black box, lacking transparency and interpretability. To alleviate this issue, we propose a novel model-driven deep unfolding framework with image reasoning prior tailored for the pan-sharpening task. Different from existing unfolding solutions that deliver the proximal operator networks as the uncertain and vague prio… ▽ More The success of deep neural networks for pan-sharpening is commonly in a form of black box, lacking transparency and interpretability. To alleviate this issue, we propose a novel model-driven deep unfolding framework with image reasoning prior tailored for the pan-sharpening task. Different from existing unfolding solutions that deliver the proximal operator networks as the uncertain and vague priors, our framework is motivated by the content reasoning ability of masked autoencoders (MAE) with insightful designs. Specifically, the pre-trained MAE with spatial masking strategy, acting as intrinsic reasoning prior, is embedded into unfolding architecture. Meanwhile, the pre-trained MAE with spatial-spectral masking strategy is treated as the regularization term within loss function to constrain the spatial-spectral consistency. Such designs penetrate the image reasoning prior into deep unfolding networks while improving its interpretability and representation capability. The uniqueness of our framework is that the holistic learning process is explicitly integrated with the inherent physical mechanism underlying the pan-sharpening task. Extensive experiments on multiple satellite datasets demonstrate the superiority of our method over the existing state-of-the-art approaches. Code will be released at \url{https://manman1995.github.io/}. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 10 pages; Accepted by ICCV 2023

arXiv:2307.00479 [pdf, other]

Domain Transfer Through Image-to-Image Translation for Uncertainty-Aware Prostate Cancer Classification

Authors: Meng Zhou, Amoon Jamzad, Jason Izard, Alexandre Menard, Robert Siemens, Parvin Mousavi

Abstract: Prostate Cancer (PCa) is a prevalent disease among men, and multi-parametric MRIs offer a non-invasive method for its detection. While MRI-based deep learning solutions have shown promise in supporting PCa diagnosis, acquiring sufficient training data, particularly in local clinics remains challenging. One potential solution is to take advantage of publicly available datasets to pre-train deep mod… ▽ More Prostate Cancer (PCa) is a prevalent disease among men, and multi-parametric MRIs offer a non-invasive method for its detection. While MRI-based deep learning solutions have shown promise in supporting PCa diagnosis, acquiring sufficient training data, particularly in local clinics remains challenging. One potential solution is to take advantage of publicly available datasets to pre-train deep models and fine-tune them on the local data, but multi-source MRIs can pose challenges due to cross-domain distribution differences. These limitations hinder the adoption of explainable and reliable deep-learning solutions in local clinics for PCa diagnosis. In this work, we present a novel approach for unpaired image-to-image translation of prostate multi-parametric MRIs and an uncertainty-aware training approach for classifying clinically significant PCa, to be applied in data-constrained settings such as local and small clinics. Our approach involves a novel pipeline for translating unpaired 3.0T multi-parametric prostate MRIs to 1.5T, thereby augmenting the available training data. Additionally, we introduce an evidential deep learning approach to estimate model uncertainty and employ dataset filtering techniques during training. Furthermore, we propose a simple, yet efficient Evidential Focal Loss, combining focal loss with evidential uncertainty, to train our model effectively. Our experiments demonstrate that the proposed method significantly improves the Area Under ROC Curve (AUC) by over 20% compared to the previous work. Our code is available at https://github.com/med-i-lab/DT_UE_PCa △ Less

Submitted 3 June, 2024; v1 submitted 2 July, 2023; originally announced July 2023.

Comments: Preprint. In Submission

arXiv:2306.14274 [pdf, other]

MEPNet: A Model-Driven Equivariant Proximal Network for Joint Sparse-View Reconstruction and Metal Artifact Reduction in CT Images

Authors: Hong Wang, Minghao Zhou, Dong Wei, Yuexiang Li, Yefeng Zheng

Abstract: Sparse-view computed tomography (CT) has been adopted as an important technique for speeding up data acquisition and decreasing radiation dose. However, due to the lack of sufficient projection data, the reconstructed CT images often present severe artifacts, which will be further amplified when patients carry metallic implants. For this joint sparse-view reconstruction and metal artifact reductio… ▽ More Sparse-view computed tomography (CT) has been adopted as an important technique for speeding up data acquisition and decreasing radiation dose. However, due to the lack of sufficient projection data, the reconstructed CT images often present severe artifacts, which will be further amplified when patients carry metallic implants. For this joint sparse-view reconstruction and metal artifact reduction task, most of the existing methods are generally confronted with two main limitations: 1) They are almost built based on common network modules without fully embedding the physical imaging geometry constraint of this specific task into the dual-domain learning; 2) Some important prior knowledge is not deeply explored and sufficiently utilized. Against these issues, we specifically construct a dual-domain reconstruction model and propose a model-driven equivariant proximal network, called MEPNet. The main characteristics of MEPNet are: 1) It is optimization-inspired and has a clear working mechanism; 2) The involved proximal operator is modeled via a rotation equivariant convolutional neural network, which finely represents the inherent rotational prior underlying the CT scanning that the same organ can be imaged at different angles. Extensive experiments conducted on several datasets comprehensively substantiate that compared with the conventional convolution-based proximal network, such a rotation equivariance mechanism enables our proposed method to achieve better reconstruction performance with fewer network parameters. We will release the code at \url{https://github.com/hongwang01/MEPNet}. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: MICCAI 2023

arXiv:2306.12020 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095084

Visual-Aware Text-to-Speech

Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei

Abstract: Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction. For example, the speaker could take advantage of the listener's facial expression to adjust the tones, stressed syllables, or pauses. In this work, we present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and s… ▽ More Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction. For example, the speaker could take advantage of the listener's facial expression to adjust the tones, stressed syllables, or pauses. In this work, we present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and sequential visual feedback (e.g., nod, smile) of the listener in face-to-face communication. Different from traditional text-to-speech, VA-TTS highlights the impact of visual modality. On this newly-minted task, we devise a baseline model to fuse phoneme linguistic information and listener visual signals for speech synthesis. Extensive experiments on multimodal conversation dataset ViCo-X verify our proposal for generating more natural audio with scenario-appropriate rhythm and prosody. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: accepted as oral and top 3% paper by ICASSP 2023

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, 1-5

arXiv:2305.07774 [pdf, other]

PanFlowNet: A Flow-Based Deep Network for Pan-sharpening

Authors: Gang Yang, Xiangyong Cao, Wenzhe Xiao, Man Zhou, Aiping Liu, Xun chen, Deyu Meng

Abstract: Pan-sharpening aims to generate a high-resolution multispectral (HRMS) image by integrating the spectral information of a low-resolution multispectral (LRMS) image with the texture details of a high-resolution panchromatic (PAN) image. It essentially inherits the ill-posed nature of the super-resolution (SR) task that diverse HRMS images can degrade into an LRMS image. However, existing deep learn… ▽ More Pan-sharpening aims to generate a high-resolution multispectral (HRMS) image by integrating the spectral information of a low-resolution multispectral (LRMS) image with the texture details of a high-resolution panchromatic (PAN) image. It essentially inherits the ill-posed nature of the super-resolution (SR) task that diverse HRMS images can degrade into an LRMS image. However, existing deep learning-based methods recover only one HRMS image from the LRMS image and PAN image using a deterministic mapping, thus ignoring the diversity of the HRMS image. In this paper, to alleviate this ill-posed issue, we propose a flow-based pan-sharpening network (PanFlowNet) to directly learn the conditional distribution of HRMS image given LRMS image and PAN image instead of learning a deterministic mapping. Specifically, we first transform this unknown conditional distribution into a given Gaussian distribution by an invertible network, and the conditional distribution can thus be explicitly defined. Then, we design an invertible Conditional Affine Coupling Block (CACB) and further build the architecture of PanFlowNet by stacking a series of CACBs. Finally, the PanFlowNet is trained by maximizing the log-likelihood of the conditional distribution given a training set and can then be used to predict diverse HRMS images. The experimental results verify that the proposed PanFlowNet can generate various HRMS images given an LRMS image and a PAN image. Additionally, the experimental results on different kinds of satellite datasets also demonstrate the superiority of our PanFlowNet compared with other state-of-the-art methods both visually and quantitatively. △ Less

Submitted 16 May, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

arXiv:2304.04484 [pdf, other]

Quasi-Synchronous Random Access for Massive MIMO-Based LEO Satellite Constellations

Authors: Keke Ying, Zhen Gao, Sheng Chen, Mingyu Zhou, Dezhi Zheng, Symeon Chatzinotas, Björn Ottersten, H. Vincent Poor

Abstract: Low earth orbit (LEO) satellite constellation-enabled communication networks are expected to be an important part of many Internet of Things (IoT) deployments due to their unique advantage of providing seamless global coverage. In this paper, we investigate the random access problem in massive multiple-input multiple-output-based LEO satellite systems, where the multi-satellite cooperative process… ▽ More Low earth orbit (LEO) satellite constellation-enabled communication networks are expected to be an important part of many Internet of Things (IoT) deployments due to their unique advantage of providing seamless global coverage. In this paper, we investigate the random access problem in massive multiple-input multiple-output-based LEO satellite systems, where the multi-satellite cooperative processing mechanism is considered. Specifically, at edge satellite nodes, we conceive a training sequence padded multi-carrier system to overcome the issue of imperfect synchronization, where the training sequence is utilized to detect the devices' activity and estimate their channels. Considering the inherent sparsity of terrestrial-satellite links and the sporadic traffic feature of IoT terminals, we utilize the orthogonal approximate message passing-multiple measurement vector algorithm to estimate the delay coefficients and user terminal activity. To further utilize the structure of the receive array, a two-dimensional estimation of signal parameters via rotational invariance technique is performed for enhancing channel estimation. Finally, at the central server node, we propose a majority voting scheme to enhance activity detection by aggregating backhaul information from multiple satellites. Moreover, multi-satellite cooperative linear data detection and multi-satellite cooperative Bayesian dequantization data detection are proposed to cope with perfect and quantized backhaul, respectively. Simulation results verify the effectiveness of our proposed schemes in terms of channel estimation, activity detection, and data detection for quasi-synchronous random access in satellite systems. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 38 pages, 16 figures. This paper has been accepted by IEEE JSAC SI on 3GPP Technologies: 5G-Advanced and Beyond. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2303.13046 [pdf, other]

Quantized Phase Alignment by Discrete Phase Shifts for Reconfigurable Intelligent Surface-Assisted Communication Systems

Authors: Jian Sang, Jifeng Lan, Mingyong Zhou, Boning Gao, Wankai Tang, Xiao Li, Xinping Yi, Shi Jin

Abstract: Reconfigurable intelligent surface (RIS) has aroused a surge of interest in recent years. In this paper, we investigate the joint phase alignment and phase quantization on discrete phase shift designs for RIS-assisted single-input single-output (SISO) system. Firstly, the phenomena of phase distribution in far field and near field are respectively unveiled, paving the way for discretization of pha… ▽ More Reconfigurable intelligent surface (RIS) has aroused a surge of interest in recent years. In this paper, we investigate the joint phase alignment and phase quantization on discrete phase shift designs for RIS-assisted single-input single-output (SISO) system. Firstly, the phenomena of phase distribution in far field and near field are respectively unveiled, paving the way for discretization of phase shift for RIS. Then, aiming at aligning phases, the phase distribution law and its underlying degree-of-freedom (DoF) are characterized, serving as the guideline of phase quantization strategies. Subsequently, two phase quantization methods, dynamic threshold phase quantization (DTPQ) and equal interval phase quantization (EIPQ), are proposed to strengthen the beamforming effect of RIS. DTPQ is capable of calculating the optimal discrete phase shifts with linear complexity in the number of unit cells on RIS, whilst EIPQ is a simplified method with a constant complexity yielding sub-optimal solution. Simulation results demonstrate that both methods achieve substantial improvements on power gain, stability, and robustness over traditional quantization methods. The path loss (PL) scaling law under discrete phase shift of RIS is unveiled for the first time, with the phase shifts designed by DTPQ due to its optimality. Additionally, the field trials conducted at 2.6 GHz and 35 GHz validate the favourable performance of the proposed methods in practical communication environment. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.04414 [pdf, other]

Next-Generation URLLC with Massive Devices: A Unified Semi-Blind Detection Framework for Sourced and Unsourced Random Access

Authors: Malong Ke, Zhen Gao, Mingyu Zhou, Dezhi Zheng, Derrick Wing Kwan Ng, H. Vincent Poor

Abstract: This paper proposes a unified semi-blind detection framework for sourced and unsourced random access (RA), which enables next-generation ultra-reliable low-latency communications (URLLC) with massive devices. Specifically, the active devices transmit their uplink access signals in a grant-free manner to realize ultra-low access latency. Meanwhile, the base station aims to achieve ultra-reliable da… ▽ More This paper proposes a unified semi-blind detection framework for sourced and unsourced random access (RA), which enables next-generation ultra-reliable low-latency communications (URLLC) with massive devices. Specifically, the active devices transmit their uplink access signals in a grant-free manner to realize ultra-low access latency. Meanwhile, the base station aims to achieve ultra-reliable data detection under severe inter-device interference without exploiting explicit channel state information (CSI). We first propose an efficient transmitter design, where a small amount of reference information (RI) is embedded in the access signal to resolve the inherent ambiguities incurred by the unknown CSI. At the receiver, we further develop a successive interference cancellation-based semi-blind detection scheme, where a bilinear generalized approximate message passing algorithm is utilized for joint channel and signal estimation (JCSE), while the embedded RI is exploited for ambiguity elimination. Particularly, a rank selection approach and a RI-aided initialization strategy are incorporated to reduce the algorithmic computational complexity and to enhance the JCSE reliability, respectively. Besides, four enabling techniques are integrated to satisfy the stringent latency and reliability requirements of massive URLLC. Numerical results demonstrate that the proposed semi-blind detection framework offers a better scalability-latency-reliability tradeoff than the state-of-the-art detection schemes dedicated to sourced or unsourced RA. △ Less

Submitted 20 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: This paper has been accepted by IEEE JSAC special issue on next-generation URLLC in 6G

arXiv:2302.05816 [pdf, ps, other]

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

Authors: Mo Zhou, Jianfeng Lu

Abstract: We consider policy gradient methods for stochastic optimal control problem in continuous time. In particular, we analyze the gradient flow for the control, viewed as a continuous time limit of the policy gradient method. We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions. The main novelty in the analysis is the notion of local op… ▽ More We consider policy gradient methods for stochastic optimal control problem in continuous time. In particular, we analyze the gradient flow for the control, viewed as a continuous time limit of the policy gradient method. We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions. The main novelty in the analysis is the notion of local optimal control function, which is introduced to characterize the local optimality of the iterate. △ Less

Submitted 22 April, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: 53 pages

MSC Class: 93E20 (Primary); 49L12 49M05 (secondary)

arXiv:2301.02277 [pdf]

LostNet: A smart way for lost and find

Authors: Meihua Zhou, Ivan Fung, Li Yang, Nan Wan, Keke Di, Tingting Wang

Abstract: Due to the enormous population growth of cities in recent years, objects are frequently lost and unclaimed on public transportation, in restaurants, or any other public areas. While services like Find My iPhone can easily identify lost electronic devices, more valuable objects cannot be tracked in an intelligent manner, making it impossible for administrators to reclaim a large number of lost and… ▽ More Due to the enormous population growth of cities in recent years, objects are frequently lost and unclaimed on public transportation, in restaurants, or any other public areas. While services like Find My iPhone can easily identify lost electronic devices, more valuable objects cannot be tracked in an intelligent manner, making it impossible for administrators to reclaim a large number of lost and found items in a timely manner. We present a method that significantly reduces the complexity of searching by comparing previous images of lost and recovered things provided by the owner with photos taken when registered lost and found items are received. In this research, we will primarily design a photo matching network by combining the fine-tuning method of MobileNetv2 with CBAM Attention and using the Internet framework to develop an online lost and found image identification system. Our implementation gets a testing accuracy of 96.8% using only 665.12M GLFOPs and 3.5M training parameters. It can recognize practice images and can be run on a regular laptop. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2210.08181 [pdf, other]

Panchromatic and Multispectral Image Fusion via Alternating Reverse Filtering Network

Authors: Keyu Yan, Man Zhou, Jie Huang, Feng Zhao, Chengjun Xie, Chongyi Li, Danfeng Hong

Abstract: Panchromatic (PAN) and multi-spectral (MS) image fusion, named Pan-sharpening, refers to super-resolve the low-resolution (LR) multi-spectral (MS) images in the spatial domain to generate the expected high-resolution (HR) MS images, conditioning on the corresponding high-resolution PAN images. In this paper, we present a simple yet effective \textit{alternating reverse filtering network} for pan-s… ▽ More Panchromatic (PAN) and multi-spectral (MS) image fusion, named Pan-sharpening, refers to super-resolve the low-resolution (LR) multi-spectral (MS) images in the spatial domain to generate the expected high-resolution (HR) MS images, conditioning on the corresponding high-resolution PAN images. In this paper, we present a simple yet effective \textit{alternating reverse filtering network} for pan-sharpening. Inspired by the classical reverse filtering that reverses images to the status before filtering, we formulate pan-sharpening as an alternately iterative reverse filtering process, which fuses LR MS and HR MS in an interpretable manner. Different from existing model-driven methods that require well-designed priors and degradation assumptions, the reverse filtering process avoids the dependency on pre-defined exact priors. To guarantee the stability and convergence of the iterative process via contraction mapping on a metric space, we develop the learnable multi-scale Gaussian kernel module, instead of using specific filters. We demonstrate the theoretical feasibility of such formulations. Extensive experiments on diverse scenes to thoroughly verify the performance of our method, significantly outperforming the state of the arts. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Journal ref: NeurIPS2022

arXiv:2209.12775 [pdf, other]

Jump Law of Co-State in Optimal Control for State-Dependent Switched Systems and Applications

Authors: Mi Zhou, Erik I. Verriest, Yue Guan, Chaouki Abdallah

Abstract: This paper presents the jump law of co-states in optimal control for state-dependent switched systems. The number of switches and the switching modes are assumed to be known a priori. A proposed jump law is rigorously derived by theoretical analysis and illustrated by simulation results. An algorithm is then proposed to solve optimal control for state-dependent hybrid systems. Through numerical si… ▽ More This paper presents the jump law of co-states in optimal control for state-dependent switched systems. The number of switches and the switching modes are assumed to be known a priori. A proposed jump law is rigorously derived by theoretical analysis and illustrated by simulation results. An algorithm is then proposed to solve optimal control for state-dependent hybrid systems. Through numerical simulations, we further show that the proposed approach is more efficient than existing methods in solving optimal control for state-dependent switched systems. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.07030 [pdf, other]

doi 10.1145/3503161.3548068

Model-Guided Multi-Contrast Deep Unfolding Network for MRI Super-resolution Reconstruction

Authors: Gang Yang, Li Zhang, Man Zhou, Aiping Liu, Xun Chen, Zhiwei Xiong, Feng Wu

Abstract: Magnetic resonance imaging (MRI) with high resolution (HR) provides more detailed information for accurate diagnosis and quantitative image analysis. Despite the significant advances, most existing super-resolution (SR) reconstruction network for medical images has two flaws: 1) All of them are designed in a black-box principle, thus lacking sufficient interpretability and further limiting their p… ▽ More Magnetic resonance imaging (MRI) with high resolution (HR) provides more detailed information for accurate diagnosis and quantitative image analysis. Despite the significant advances, most existing super-resolution (SR) reconstruction network for medical images has two flaws: 1) All of them are designed in a black-box principle, thus lacking sufficient interpretability and further limiting their practical applications. Interpretable neural network models are of significant interest since they enhance the trustworthiness required in clinical practice when dealing with medical images. 2) most existing SR reconstruction approaches only use a single contrast or use a simple multi-contrast fusion mechanism, neglecting the complex relationships between different contrasts that are critical for SR improvement. To deal with these issues, in this paper, a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction is proposed. The Model-Guided image SR reconstruction approach solves manually designed objective functions to reconstruct HR MRI. We show how to unfold an iterative MGDUN algorithm into a novel model-guided deep unfolding network by taking the MRI observation matrix and explicit multi-contrast relationship matrix into account during the end-to-end optimization. Extensive experiments on the multi-contrast IXI dataset and BraTs 2019 dataset demonstrate the superiority of our proposed model. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: Accepted to ACMMM 2022, 9 pages

arXiv:2208.05383 [pdf, other]

doi 10.1109/TIM.2022.3200360

Precise Repositioning of Robotic Ultrasound: Improving Registration-based Motion Compensation using Ultrasound Confidence Optimization

Authors: Zhongliang Jiang, Nehil Danis, Yuan Bi, Mingchuan Zhou, Markus Kroenke, Thomas Wendler, Nassir Navab

Abstract: Robotic ultrasound (US) imaging has been seen as a promising solution to overcome the limitations of free-hand US examinations, i.e., inter-operator variability. However, the fact that robotic US systems cannot react to subject movements during scans limits their clinical acceptance. Regarding human sonographers, they often react to patient movements by repositioning the probe or even restarting t… ▽ More Robotic ultrasound (US) imaging has been seen as a promising solution to overcome the limitations of free-hand US examinations, i.e., inter-operator variability. However, the fact that robotic US systems cannot react to subject movements during scans limits their clinical acceptance. Regarding human sonographers, they often react to patient movements by repositioning the probe or even restarting the acquisition, in particular for the scans of anatomies with long structures like limb arteries. To realize this characteristic, we proposed a vision-based system to monitor the subject's movement and automatically update the scan trajectory thus seamlessly obtaining a complete 3D image of the target anatomy. The motion monitoring module is developed using the segmented object masks from RGB images. Once the subject is moved, the robot will stop and recompute a suitable trajectory by registering the surface point clouds of the object obtained before and after the movement using the iterative closest point algorithm. Afterward, to ensure optimal contact conditions after repositioning US probe, a confidence-based fine-tuning process is used to avoid potential gaps between the probe and contact surface. Finally, the whole system is validated on a human-like arm phantom with an uneven surface, while the object segmentation network is also validated on volunteers. The results demonstrate that the presented system can react to object movements and reliably provide accurate 3D images. △ Less

Submitted 5 September, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: The paper has been accepted by IEEE TIM. Video: https://www.youtube.com/watch?v=MUtgSXS7EZI

arXiv:2208.03323 [pdf, other]

doi 10.1145/3503161.3548193

DeepWSD: Projecting Degradations in Perceptual Space to Wasserstein Distance in Deep Feature Space

Authors: Xingran Liao, Baoliang Chen, Hanwei Zhu, Shiqi Wang, Mingliang Zhou, Sam Kwong

Abstract: Existing deep learning-based full-reference IQA (FR-IQA) models usually predict the image quality in a deterministic way by explicitly comparing the features, gauging how severely distorted an image is by how far the corresponding feature lies from the space of the reference images. Herein, we look at this problem from a different viewpoint and propose to model the quality degradation in perceptua… ▽ More Existing deep learning-based full-reference IQA (FR-IQA) models usually predict the image quality in a deterministic way by explicitly comparing the features, gauging how severely distorted an image is by how far the corresponding feature lies from the space of the reference images. Herein, we look at this problem from a different viewpoint and propose to model the quality degradation in perceptual space from a statistical distribution perspective. As such, the quality is measured based upon the Wasserstein distance in the deep feature domain. More specifically, the 1DWasserstein distance at each stage of the pre-trained VGG network is measured, based on which the final quality score is performed. The deep Wasserstein distance (DeepWSD) performed on features from neural networks enjoys better interpretability of the quality contamination caused by various types of distortions and presents an advanced quality prediction capability. Extensive experiments and theoretical analysis show the superiority of the proposed DeepWSD in terms of both quality prediction and optimization. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: ACM Multimedia 2022 accepted thesis

arXiv:2207.01983 [pdf, ps, other]

Massive Access in Extra Large-Scale MIMO with Mixed-ADC over Near Field Channels

Authors: Yikun Mei, Zhen Gao, De Mi, Mingyu Zhou, Dezhi Zheng, Michail Matthaiou, Pei Xiao, Robert Schober

Abstract: Massive connectivity for extra large-scale multi-input multi-output (XL-MIMO) systems is a challenging issue due to the near-field access channels and the prohibitive cost. In this paper, we propose an uplink grant-free massive access scheme for XL-MIMO systems, in which a mixed-analog-to-digital converters (ADC) architecture is adopted to strike the right balance between access performance and po… ▽ More Massive connectivity for extra large-scale multi-input multi-output (XL-MIMO) systems is a challenging issue due to the near-field access channels and the prohibitive cost. In this paper, we propose an uplink grant-free massive access scheme for XL-MIMO systems, in which a mixed-analog-to-digital converters (ADC) architecture is adopted to strike the right balance between access performance and power consumption. By exploiting the spatial-domain structured sparsity and the piecewise angular-domain cluster sparsity of massive access channels, a compressive sensing (CS)-based two-stage orthogonal approximate message passing algorithm is proposed to efficiently solve the joint activity detection and channel estimation problem. Particularly, high-precision quantized measurements are leveraged to perform accurate hyper-parameter estimation, thereby facilitating the activity detection. Moreover, we adopt a subarray-wise estimation strategy to overcome the severe angular-domain energy dispersion problem which is caused by the near-field effect in XL-MIMO channels. Simulation results verify the superiority of our proposed algorithm over state-of-the-art CS algorithms for massive access based on XL-MIMO with mixed-ADC architectures. △ Less

Submitted 3 April, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted by IEEE TVT

arXiv:2206.07163 [pdf, other]

DeepRecon: Joint 2D Cardiac Segmentation and 3D Volume Reconstruction via A Structure-Specific Generative Method

Authors: Qi Chang, Zhennan Yan, Mu Zhou, Di Liu, Khalid Sawalha, Meng Ye, Qilong Zhangli, Mikael Kanski, Subhi Al Aref, Leon Axel, Dimitris Metaxas

Abstract: Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental to building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-… ▽ More Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental to building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-end latent-space-based framework, DeepRecon, that generates multiple clinically essential outcomes, including accurate image segmentation, synthetic high-resolution 3D image, and 3D reconstructed volume. Our method identifies the optimal latent representation of the cine image that contains accurate semantic information for cardiac structures. In particular, our model jointly generates synthetic images with accurate semantic information and segmentation of the cardiac structures using the optimal latent representation. We further explore downstream applications of 3D shape reconstruction and 4D motion pattern adaptation by the different latent-space manipulation strategies.The simultaneously generated high-resolution images present a high interpretable value to assess the cardiac shape and motion.Experimental results demonstrate the effectiveness of our approach on multiple fronts including 2D segmentation, 3D reconstruction, downstream 4D motion pattern adaption performance. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: MICCAI2022

arXiv:2205.09452 [pdf, other]

Learning-based AC-OPF Solvers on Realistic Network and Realistic Loads

Authors: Tsun Ho Aaron Cheung, Min Zhou, Minghua Chen

Abstract: Deep learning approaches for the Alternating Current-Optimal Power Flow (AC-OPF) problem are under active research in recent years. A common shortcoming in this area of research is the lack of a dataset that includes both a realistic power network topology and the corresponding realistic loads. To address this issue, we construct an AC-OPF formulation-ready dataset called TAS-97 that contains real… ▽ More Deep learning approaches for the Alternating Current-Optimal Power Flow (AC-OPF) problem are under active research in recent years. A common shortcoming in this area of research is the lack of a dataset that includes both a realistic power network topology and the corresponding realistic loads. To address this issue, we construct an AC-OPF formulation-ready dataset called TAS-97 that contains realistic network information and realistic bus loads from Tasmania's electricity network. We found that the realistic loads in Tasmania are correlated between buses and they show signs of an underlying multivariate normal distribution. Feasibility-optimized end-to-end deep neural network models are trained and tested on the constructed dataset. Trained on samples with bus loads generated from a fitted multivariate normal distribution, our learning-based AC-OPF solver achieves 0.13% cost optimality gap, 99.73% feasibility rate, and 38.62 times of speedup on realistic testing samples when compared to PYPOWER. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: 8 pages, 6 figures

arXiv:2204.03125 [pdf, other]

Deep transfer learning for system identification using long short-term memory neural networks

Authors: Kaicheng Niu, Mi Zhou, Chaouki T. Abdallah, Mohammad Hayajneh

Abstract: Recurrent neural networks (RNNs) have many advantages over more traditional system identification techniques. They may be applied to linear and nonlinear systems, and they require fewer modeling assumptions. However, these neural network models may also need larger amounts of data to learn and generalize. Furthermore, neural networks training is a time-consuming process. Hence, building upon long-… ▽ More Recurrent neural networks (RNNs) have many advantages over more traditional system identification techniques. They may be applied to linear and nonlinear systems, and they require fewer modeling assumptions. However, these neural network models may also need larger amounts of data to learn and generalize. Furthermore, neural networks training is a time-consuming process. Hence, building upon long-short term memory neural networks (LSTM), this paper proposes using two types of deep transfer learning, namely parameter fine-tuning and freezing, to reduce the data and computation requirements for system identification. We apply these techniques to identify two dynamical systems, namely a second-order linear system and a Wiener-Hammerstein nonlinear system. Results show that compared with direct learning, our method accelerates learning by 10% to 50%, which also saves data and computing resources. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2203.10819 [pdf, ps, other]

doi 10.1109/JSAC.2022.3158696

Proximal Policy Optimization-based Transmit Beamforming and Phase-shift Design in an IRS-aided ISAC System for the THz Band

Authors: Xiangnan Liu, Haijun Zhang, Keping Long, Mingyu Zhou, Yonghui Li, H. Vincent Poor

Abstract: In this paper, an IRS-aided integrated sensing and communications (ISAC) system operating in the terahertz (THz) band is proposed to maximize the system capacity. Transmit beamforming and phase-shift design are transformed into a universal optimization problem with ergodic constraints. Then the joint optimization of transmit beamforming and phase-shift design is achieved by gradient-based, primal-… ▽ More In this paper, an IRS-aided integrated sensing and communications (ISAC) system operating in the terahertz (THz) band is proposed to maximize the system capacity. Transmit beamforming and phase-shift design are transformed into a universal optimization problem with ergodic constraints. Then the joint optimization of transmit beamforming and phase-shift design is achieved by gradient-based, primal-dual proximal policy optimization (PPO) in the multi-user multiple-input single-output (MISO) scenario. Specifically, the actor part generates continuous transmit beamforming and the critic part takes charge of discrete phase shift design. Based on the MISO scenario, we investigate a distributed PPO (DPPO) framework with the concept of multi-threading learning in the multi-user multiple-input multiple-output (MIMO) scenario. Simulation results demonstrate the effectiveness of the primal-dual PPO algorithm and its multi-threading version in terms of transmit beamforming and phase-shift design. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.10726 [pdf, other]

TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers

Authors: Di Liu, Yunhe Gao, Qilong Zhangli, Ligong Han, Xiaoxiao He, Zhaoyang Xia, Song Wen, Qi Chang, Zhennan Yan, Mu Zhou, Dimitris Metaxas

Abstract: Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view im… ▽ More Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms. In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining, addressing the critical issue of capturing long-range correlations between unaligned data from different image views. We further propose the Multi-Scale Attention (MSA) to collect global correspondence of multi-scale feature representations. We evaluate TransFusion on the Multi-Disease, Multi-View \& Multi-Center Right Ventricular Segmentation in Cardiac MRI (M\&Ms-2) challenge cohort. TransFusion demonstrates leading performance against the state-of-the-art methods and opens up new perspectives for multi-view imaging integration towards robust medical image segmentation. △ Less

Submitted 5 September, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.04960 [pdf, other]

Memory-augmented Deep Unfolding Network for Guided Image Super-resolution

Authors: Man Zhou, Keyu Yan, Jinshan Pan, Wenqi Ren, Qi Xie, Xiangyong Cao

Abstract: Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image. However, previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image, simply ignoring many non-local comm… ▽ More Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image. However, previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image, simply ignoring many non-local common characteristics between them. To alleviate this issue, we firstly propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image, i.e., local implicit prior and global implicit prior. The local implicit prior aims to model the complex relationship between the HR target image and the HR guidance image from a local perspective, and the global implicit prior considers the non-local auto-regression property between the two images from a global perspective. Secondly, we design a novel alternating optimization algorithm to solve this model for GISR. The algorithm is in a concise framework that facilitates to be replicated into commonly used deep network structures. Thirdly, to reduce the information loss across iterative stages, the persistent memory mechanism is introduced to augment the information representation by exploiting the Long short-term memory unit (LSTM) in the image and feature spaces. In this way, a deep network with certain interpretation and high representation ability is built. Extensive experimental results validate the superiority of our method on a variety of GISR tasks, including Pan-sharpening, depth image super-resolution, and MR image super-resolution. △ Less

Submitted 12 February, 2022; originally announced March 2022.

Comments: 24 pages, 16 figures

arXiv:2203.00131 [pdf, other]

A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark

Authors: Yunhe Gao, Mu Zhou, Di Liu, Zhennan Yan, Shaoting Zhang, Dimitris N. Metaxas

Abstract: Transformers have demonstrated remarkable performance in natural language processing and computer vision. However, existing vision Transformers struggle to learn from limited medical data and are unable to generalize on diverse medical image tasks. To tackle these challenges, we present MedFormer, a data-scalable Transformer designed for generalizable 3D medical image segmentation. Our approach in… ▽ More Transformers have demonstrated remarkable performance in natural language processing and computer vision. However, existing vision Transformers struggle to learn from limited medical data and are unable to generalize on diverse medical image tasks. To tackle these challenges, we present MedFormer, a data-scalable Transformer designed for generalizable 3D medical image segmentation. Our approach incorporates three key elements: a desirable inductive bias, hierarchical modeling with linear-complexity attention, and multi-scale feature fusion that integrates spatial and semantic information globally. MedFormer can learn across tiny- to large-scale data without pre-training. Comprehensive experiments demonstrate MedFormer's potential as a versatile segmentation backbone, outperforming CNNs and vision Transformers on seven public datasets covering multiple modalities (e.g., CT and MRI) and various medical targets (e.g., healthy organs, diseased tissues, and tumors). We provide public access to our models and evaluation pipeline, offering solid baselines and unbiased comparisons to advance a wide range of downstream clinical applications. △ Less

Submitted 4 April, 2023; v1 submitted 28 February, 2022; originally announced March 2022.

arXiv:2202.08916 [pdf, other]

Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications

Authors: Kexin Ding, Mu Zhou, Zichen Wang, Qiao Liu, Corey W. Arnold, Shaoting Zhang, Dimitri N. Metaxas

Abstract: Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remar… ▽ More Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fast-growing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research. △ Less

Submitted 20 April, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

arXiv:2112.07087 [pdf, other]

Heuristic Hyperparameter Optimization for Convolutional Neural Networks using Genetic Algorithm

Authors: Meng Zhou

Abstract: In recent years, people from all over the world are suffering from one of the most severe diseases in history, known as Coronavirus disease 2019, COVID-19 for short. When the virus reaches the lungs, it has a higher probability to cause lung pneumonia and sepsis. X-ray image is a powerful tool in identifying the typical features of the infection for COVID-19 patients. The radiologists and patholog… ▽ More In recent years, people from all over the world are suffering from one of the most severe diseases in history, known as Coronavirus disease 2019, COVID-19 for short. When the virus reaches the lungs, it has a higher probability to cause lung pneumonia and sepsis. X-ray image is a powerful tool in identifying the typical features of the infection for COVID-19 patients. The radiologists and pathologists observe that ground-glass opacity appears in the chest X-ray for infected patient \cite{cozzi2021ground}, and it could be used as one of the criteria during the diagnosis process. In the past few years, deep learning has proven to be one of the most powerful methods in the field of image classification. Due to significant differences in Chest X-Ray between normal and infected people \cite{rousan2020chest}, deep models could be used to identify the presence of the disease given a patient's Chest X-Ray. Many deep models are complex, and it evolves with lots of input parameters. Designers sometimes struggle with the tuning process for deep models, especially when they build up the model from scratch. Genetic Algorithm, inspired by the biological evolution process, plays a key role in solving such complex problems. In this paper, I proposed a genetic-based approach to optimize the Convolutional Neural Network(CNN) for the Chest X-Ray classification task. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 8 pages, 3 figures

arXiv:2110.14285 [pdf, ps, other]

Over-the-Air Aggregation for Federated Learning: Waveform Superposition and Prototype Validation

Authors: Huayan Guo, Yifan Zhu, Haoyu Ma, Vincent K. N. Lau, Kaibin Huang, Xiaofan Li, Huabin Nong, Mingyu Zhou

Abstract: In this paper, we develop an orthogonal-frequency-division-multiplexing (OFDM)-based over-the-air (OTA) aggregation solution for wireless federated learning (FL). In particular, the local gradients in massive IoT devices are modulated by an analog waveform and are then transmitted using the same wireless resources. To this end, achieving perfect waveform superposition is the key challenge, which i… ▽ More In this paper, we develop an orthogonal-frequency-division-multiplexing (OFDM)-based over-the-air (OTA) aggregation solution for wireless federated learning (FL). In particular, the local gradients in massive IoT devices are modulated by an analog waveform and are then transmitted using the same wireless resources. To this end, achieving perfect waveform superposition is the key challenge, which is difficult due to the existence of frame timing offset (TO) and carrier frequency offset (CFO). In order to address these issues, we propose a two-stage waveform pre-equalization technique with a customized multiple access protocol that can estimate and then mitigate the TO and CFO for the OTA aggregation. Based on the proposed solution, we develop a hardware transceiver and application software to train a real-world FL task, which learns a deep neural network to predict the received signal strength with global positioning system information. Experiments verify that the proposed OTA aggregation solution can achieve comparable performance to offline learning procedures with high prediction accuracy. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2110.07469 [pdf, other]

Shaping Large Population Agent Behaviors Through Entropy-Regularized Mean-Field Games

Authors: Yue Guan, Mi Zhou, Ali Pakniyat, Panagiotis Tsiotras

Abstract: Mean-field games (MFG) were introduced to efficiently analyze approximate Nash equilibria in large population settings. In this work, we consider entropy-regularized mean-field games with a finite state-action space in a discrete time setting. We show that entropy regularization provides the necessary regularity conditions, that are lacking in the standard finite mean field games. Such regularity… ▽ More Mean-field games (MFG) were introduced to efficiently analyze approximate Nash equilibria in large population settings. In this work, we consider entropy-regularized mean-field games with a finite state-action space in a discrete time setting. We show that entropy regularization provides the necessary regularity conditions, that are lacking in the standard finite mean field games. Such regularity conditions enable us to design fixed-point iteration algorithms to find the unique mean-field equilibrium (MFE). Furthermore, the reference policy used in the regularization provides an extra parameter, through which one can control the behavior of the population. We first consider a stochastic game with a large population of $N$ homogeneous agents. We establish conditions for the existence of a Nash equilibrium in the limiting case as $N$ tends to infinity, and we demonstrate that the Nash equilibrium for the infinite population case is also an $ε$-Nash equilibrium for the $N$-agent system, where the sub-optimality $ε$ is of order $\mathcal{O}\big(1/\sqrt{N}\big)$. Finally, we verify the theoretical guarantees through a resource allocation example and demonstrate the efficacy of using a reference policy to control the behavior of a large population. △ Less

Submitted 22 July, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2110.05765 [pdf, other]

Music Sentiment Transfer

Authors: Miles Sigel, Michael Zhou, Jiebo Luo

Abstract: Music sentiment transfer is a completely novel task. Sentiment transfer is a natural evolution of the heavily-studied style transfer task, as sentiment transfer is rooted in applying the sentiment of a source to be the new sentiment for a target piece of media; yet compared to style transfer, sentiment transfer has been only scantily studied on images. Music sentiment transfer attempts to apply th… ▽ More Music sentiment transfer is a completely novel task. Sentiment transfer is a natural evolution of the heavily-studied style transfer task, as sentiment transfer is rooted in applying the sentiment of a source to be the new sentiment for a target piece of media; yet compared to style transfer, sentiment transfer has been only scantily studied on images. Music sentiment transfer attempts to apply the high level objective of sentiment transfer to the domain of music. We propose CycleGAN to bridge disparate domains. In order to use the network, we choose to use symbolic, MIDI, data as the music format. Through the use of a cycle consistency loss, we are able to create one-to-one mappings that preserve the content and realism of the source data. Results and literature suggest that the task of music sentiment transfer is more difficult than image sentiment transfer because of the temporal characteristics of music and lack of existing datasets. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: NSF REU: Computational Methods for Understanding Music, Media, and Minds, University of Rochester

arXiv:2107.08411 [pdf, other]

doi 10.1109/LRA.2021.3099080

Deformation-Aware Robotic 3D Ultrasound

Authors: Zhongliang Jiang, Yue Zhou, Yuan Bi, Mingchuan Zhou, Thomas Wendler, Nassir Navab

Abstract: Tissue deformation in ultrasound (US) imaging leads to geometrical errors when measuring tissues due to the pressure exerted by probes. Such deformation has an even larger effect on 3D US volumes as the correct compounding is limited by the inconsistent location and geometry. This work proposes a patient-specified stiffness-based method to correct the tissue deformations in robotic 3D US acquisiti… ▽ More Tissue deformation in ultrasound (US) imaging leads to geometrical errors when measuring tissues due to the pressure exerted by probes. Such deformation has an even larger effect on 3D US volumes as the correct compounding is limited by the inconsistent location and geometry. This work proposes a patient-specified stiffness-based method to correct the tissue deformations in robotic 3D US acquisitions. To obtain the patient-specified model, robotic palpation is performed at sampling positions on the tissue. The contact force, US images and the probe poses of the palpation procedure are recorded. The contact force and the probe poses are used to estimate the nonlinear tissue stiffness. The images are fed to an optical flow algorithm to compute the pixel displacement. Then the pixel-wise tissue deformation under different forces is characterized by a coupled quadratic regression. To correct the deformation at unseen positions on the trajectory for building 3D volumes, an interpolation is performed based on the stiffness values computed at the sampling positions. With the stiffness and recorded force, the tissue displacement could be corrected. The method was validated on two blood vessel phantoms with different stiffness. The results demonstrate that the method can effectively correct the force-induced deformation and finally generate 3D tissue geometries △ Less

Submitted 18 July, 2021; originally announced July 2021.

Comments: Accepted for publication in IEEE Robotics and Automation Letters; Video: https://www.youtube.com/watch?v=MlZtugQ2cvQ

Journal ref: IEEE Robotics and Automation Letters 2021

arXiv:2105.14758 [pdf, other]

Low-Dose CT Denoising Using a Structure-Preserving Kernel Prediction Network

Authors: Lu Xu, Yuwei Zhang, Ying Liu, Daoye Wang, Mu Zhou, Jimmy Ren, Jingwei Wei, Zhaoxiang Ye

Abstract: Low-dose CT has been a key diagnostic imaging modality to reduce the potential risk of radiation overdose to patient health. Despite recent advances, CNN-based approaches typically apply filters in a spatially invariant way and adopt similar pixel-level losses, which treat all regions of the CT image equally and can be inefficient when fine-grained structures coexist with non-uniformly distributed… ▽ More Low-dose CT has been a key diagnostic imaging modality to reduce the potential risk of radiation overdose to patient health. Despite recent advances, CNN-based approaches typically apply filters in a spatially invariant way and adopt similar pixel-level losses, which treat all regions of the CT image equally and can be inefficient when fine-grained structures coexist with non-uniformly distributed noises. To address this issue, we propose a Structure-preserving Kernel Prediction Network (StructKPN) that combines the kernel prediction network with a structure-aware loss function that utilizes the pixel gradient statistics and guides the model towards spatially-variant filters that enhance noise removal, prevent over-smoothing and preserve detailed structures for different regions in CT imaging. Extensive experiments demonstrated that our approach achieved superior performance on both synthetic and non-synthetic datasets, and better preserves structures that are highly desired in clinical screening and low-dose protocol optimization. △ Less

Submitted 23 July, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: ICIP2021

arXiv:2104.07667 [pdf]

Shoulder Implant X-Ray Manufacturer Classification: Exploring with Vision Transformer

Authors: Meng Zhou, Shanglin Mo

Abstract: Shoulder replacement surgery, also called total shoulder replacement, is a common and complex surgery in Orthopedics discipline. It involves replacing a dead shoulder joint with an artificial implant. In the market, there are many artificial implant manufacturers and each of them may produce different implants with different structures compares to other providers. The problem arises in the followi… ▽ More Shoulder replacement surgery, also called total shoulder replacement, is a common and complex surgery in Orthopedics discipline. It involves replacing a dead shoulder joint with an artificial implant. In the market, there are many artificial implant manufacturers and each of them may produce different implants with different structures compares to other providers. The problem arises in the following situation: a patient has some problems with the shoulder implant accessory and the manufacturer of that implant maybe unknown to either the patient or the doctor, therefore, correctly identification of the manufacturer is the key prior to the treatment. In this paper, we will demonstrate different methods for classifying the manufacturer of a shoulder implant. We will use Vision Transformer approach to this task for the first time ever △ Less

Submitted 21 April, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: 11 pages, 12 figures

arXiv:2103.16493 [pdf, other]

Enabling Data Diversity: Efficient Automatic Augmentation via Regularized Adversarial Training

Authors: Yunhe Gao, Zhiqiang Tang, Mu Zhou, Dimitris Metaxas

Abstract: Data augmentation has proved extremely useful by increasing training data variance to alleviate overfitting and improve deep neural networks' generalization performance. In medical image analysis, a well-designed augmentation policy usually requires much expert knowledge and is difficult to generalize to multiple tasks due to the vast discrepancies among pixel intensities, image appearances, and o… ▽ More Data augmentation has proved extremely useful by increasing training data variance to alleviate overfitting and improve deep neural networks' generalization performance. In medical image analysis, a well-designed augmentation policy usually requires much expert knowledge and is difficult to generalize to multiple tasks due to the vast discrepancies among pixel intensities, image appearances, and object shapes in different medical tasks. To automate medical data augmentation, we propose a regularized adversarial training framework via two min-max objectives and three differentiable augmentation models covering affine transformation, deformation, and appearance changes. Our method is more automatic and efficient than previous automatic augmentation methods, which still rely on pre-defined operations with human-specified ranges and costly bi-level optimization. Extensive experiments demonstrated that our approach, with less training overhead, achieves superior performance over state-of-the-art auto-augmentation methods on both tasks of 2D skin cancer classification and 3D organs-at-risk segmentation. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: Accepted by IPMI 2021

Showing 1–50 of 87 results for author: Zhou, M