Search | arXiv e-print repository

arXiv:2407.20165 [pdf, other]

Meta-Learning for Adaptive Control with Automated Mirror Descent

Authors: Sunbochen Tang, Haoyuan Sun, Navid Azizan

Abstract: Adaptive control achieves concurrent parameter learning and stable control under uncertainties that are linearly parameterized with known nonlinear features. Nonetheless, it is often difficult to obtain such nonlinear features. To address this difficulty, recent progress has been made in integrating meta-learning with adaptive control to learn such nonlinear features from data. However, these meta… ▽ More Adaptive control achieves concurrent parameter learning and stable control under uncertainties that are linearly parameterized with known nonlinear features. Nonetheless, it is often difficult to obtain such nonlinear features. To address this difficulty, recent progress has been made in integrating meta-learning with adaptive control to learn such nonlinear features from data. However, these meta-learning-based control methods rely on classical adaptation laws using gradient descent, which is confined to the Euclidean geometry. In this paper, we propose a novel method that combines meta-learning and adaptation laws based on mirror descent, a popular generalization of gradient descent, which takes advantage of the potentially non-Euclidean geometry of the parameter space. In our approach, meta-learning not only learns the nonlinear features but also searches for a suitable mirror-descent potential function that optimizes control performance. Through numerical simulations, we demonstrate the effectiveness of the proposed method in learning efficient representations and real-time tracking control performance under uncertain dynamics. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.06800 [pdf, other]

Learn and Don't Forget: Adding a New Language to ASR Foundation Models

Authors: Mengjie Qian, Siyuan Tang, Rao Ma, Kate M. Knill, Mark J. F. Gales

Abstract: Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code… ▽ More Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code tuning, train only the language code; soft prompt tuning, train prepended tokens; and LoRA where a small set of additional parameters are optimised. Elastic Weight Consolidation (EWC) offers an alternative compromise with the potential to maintain performance in specific target languages. Results show that direct fine-tuning yields the best performance for the new language but degrades existing language capabilities. EWC can address this issue for specific languages. If only adaptation parameters are used, the language capabilities are maintained but at the cost of performance in the new language. △ Less

Submitted 19 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.02975 [pdf, other]

A Shared-Aperture Dual-Band sub-6 GHz and mmWave Reconfigurable Intelligent Surface With Independent Operation

Authors: Junhui Rao, Yujie Zhang, Shiwen Tang, Zan Li, Zhaoyang Ming, Jichen Zhang, Chi Yuk Chiu, Ross Murch

Abstract: A novel dual-band reconfigurable intelligent surface (DBI-RIS) design that combines the functionalities of millimeter-wave (mmWave) and sub-6 GHz bands within a single aperture is proposed. This design aims to bridge the gap between current single-band reconfigurable intelligent surfaces (RISs) and wireless systems utilizing sub-6 GHz and mmWave bands that require RIS with independently reconfigur… ▽ More A novel dual-band reconfigurable intelligent surface (DBI-RIS) design that combines the functionalities of millimeter-wave (mmWave) and sub-6 GHz bands within a single aperture is proposed. This design aims to bridge the gap between current single-band reconfigurable intelligent surfaces (RISs) and wireless systems utilizing sub-6 GHz and mmWave bands that require RIS with independently reconfigurable dual-band operation. The mmWave element is realized by a double-layer patch antenna loaded with 1-bit phase shifters, providing two reconfigurable states. An 8x8 mmWave element array is selectively interconnected using three RF switches to form a reconfigurable sub-6 GHz element at 3.5 GHz. A suspended electromagnetic band gap (EBG) structure is proposed to suppress surface waves and ensure sufficient geometric space for the phase shifter and control networks in the mmWave element. A low-cost planar spiral inductor (PSI) is carefully optimized to connect mmWave elements, enabling the sub-6 GHz function without affecting mmWave operation. Finally, prototypes of the DBI-RIS are fabricated, and experimental verification is conducted using two separate measurement testbeds. The fabricated sub-6 GHz RIS successfully achieves beam steering within the range of -35 to 35 degrees for DBI-RIS with 4x4 sub-6 GHz elements, while the mmWave RIS demonstrates beam steering between -30 to 30 degrees for DBI-RIS with 8x8 mmWave elements, and have good agreement with simulation results. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.10553 [pdf, other]

Revealing the Trade-off in ISAC Systems: The KL Divergence Perspective

Authors: Zesong Fei, Shuntian Tang, Xinyi Wang, Fanghao Xia, Fan Liu, J. Andrew Zhang

Abstract: Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detecti… ▽ More Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detection. Thereafter, we investigate the impact of constellation and beamforming design on the Pareto bound via deep learning and semi-definite relaxation (SDR) techniques. Simulation results show the trade-off between sensing and communication performance in terms of bit error rate (BER) and probability of detection under different parameter set-ups. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 5 pages, 5 figures; submitted to IEEE journals for possible publication

arXiv:2405.09234 [pdf, other]

Enhancing Image Privacy in Semantic Communication over Wiretap Channels leveraging Differential Privacy

Authors: Weixuan Chen, Shunpu Tang, Qianqian Yang

Abstract: Semantic communication (SemCom) enhances transmission efficiency by sending only task-relevant information compared to traditional methods. However, transmitting semantic-rich data over insecure or public channels poses security and privacy risks. This paper addresses the privacy problem of transmitting images over wiretap channels and proposes a novel SemCom approach ensuring privacy through a di… ▽ More Semantic communication (SemCom) enhances transmission efficiency by sending only task-relevant information compared to traditional methods. However, transmitting semantic-rich data over insecure or public channels poses security and privacy risks. This paper addresses the privacy problem of transmitting images over wiretap channels and proposes a novel SemCom approach ensuring privacy through a differential privacy (DP)-based image protection and deprotection mechanism. The method utilizes the GAN inversion technique to extract disentangled semantic features and applies a DP mechanism to protect sensitive features within the extracted semantic information. To address the non-invertibility of DP, we introduce two neural networks to approximate the DP application and removal processes, offering a privacy protection level close to that by the original DP process. Simulation results validate the effectiveness of our method in preventing eavesdroppers from obtaining sensitive information while maintaining high-fidelity image reconstruction at the legitimate receiver. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.07442 [pdf]

Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

Authors: Pengfei Zhang, Zhihang Zheng, Shichen Zhang, Minghao Yang, Shaojun Tang

Abstract: Compared with invasive examinations that require tissue sampling, respiratory sound testing is a non-invasive examination method that is safer and easier for patients to accept. In this study, we introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition. Rene has been rigorously fine-tuned with an extensive dataset featuring a broad array of respiratory audio sample… ▽ More Compared with invasive examinations that require tissue sampling, respiratory sound testing is a non-invasive examination method that is safer and easier for patients to accept. In this study, we introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition. Rene has been rigorously fine-tuned with an extensive dataset featuring a broad array of respiratory audio samples, targeting disease detection, sound pattern classification, and event identification. Our innovative approach applies a pre-trained speech recognition model to process respiratory sounds, augmented with patient medical records. The resulting multi-modal deep-learning framework addresses interpretability and real-time diagnostic challenges that have hindered previous respiratory-focused models. Benchmark comparisons reveal that Rene significantly outperforms existing models, achieving improvements of 10.27%, 16.15%, 15.29%, and 18.90% in respiratory event detection and audio classification on the SPRSound database. Disease prediction accuracy on the ICBHI database improved by 23% over the baseline in both mean average and harmonic scores. Moreover, we have developed a real-time respiratory sound discrimination system utilizing the Rene architecture. Employing state-of-the-art Edge AI technology, this system enables rapid and accurate responses for respiratory sound auscultation(https://github.com/zpforlove/Rene). △ Less

Submitted 6 June, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

arXiv:2404.12170 [pdf, other]

Secure Semantic Communication for Image Transmission in the Presence of Eavesdroppers

Authors: Shunpu Tang, Chen Liu, Qianqian Yang, Shibo He, Dusit Niyato

Abstract: Semantic communication (SemCom) has emerged as a key technology for the forthcoming sixth-generation (6G) network, attributed to its enhanced communication efficiency and robustness against channel noise. However, the open nature of wireless channels renders them vulnerable to eavesdropping, posing a serious threat to privacy. To address this issue, we propose a novel secure semantic communication… ▽ More Semantic communication (SemCom) has emerged as a key technology for the forthcoming sixth-generation (6G) network, attributed to its enhanced communication efficiency and robustness against channel noise. However, the open nature of wireless channels renders them vulnerable to eavesdropping, posing a serious threat to privacy. To address this issue, we propose a novel secure semantic communication (SemCom) approach for image transmission, which integrates steganography technology to conceal private information within non-private images (host images). Specifically, we propose an invertible neural network (INN)-based signal steganography approach, which embeds channel input signals of a private image into those of a host image before transmission. This ensures that the original private image can be reconstructed from the received signals at the legitimate receiver, while the eavesdropper can only decode the information of the host image. Simulation results demonstrate that the proposed approach maintains comparable reconstruction quality of both host and private images at the legitimate receiver, compared to scenarios without any secure mechanisms. Experiments also show that the eavesdropper is only able to reconstruct host images, showcasing the enhanced security provided by our approach. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2403.20237 [pdf, other]

Evolving Semantic Communication with Generative Model

Authors: Shunpu Tang, Qianqian Yang, Deniz Gündüz, Zhaoyang Zhang

Abstract: Recently, learning-based semantic communication (SemCom) has emerged as a promising approach in the upcoming 6G network and researchers have made remarkable efforts in this field. However, existing works have yet to fully explore the advantages of the evolving nature of learning-based systems, where knowledge accumulates during transmission have the potential to enhance system performance. In this… ▽ More Recently, learning-based semantic communication (SemCom) has emerged as a promising approach in the upcoming 6G network and researchers have made remarkable efforts in this field. However, existing works have yet to fully explore the advantages of the evolving nature of learning-based systems, where knowledge accumulates during transmission have the potential to enhance system performance. In this paper, we explore an evolving semantic communication system for image transmission, referred to as ESemCom, with the capability to continuously enhance transmission efficiency. The system features a novel channel-aware semantic encoder that utilizes a pre-trained Semantic StyleGAN to extract the channel-correlated latent variables consisting of serval semantic vectors from the input images, which can be directly transmitted over a noisy channel without further channel coding. Moreover, we introduce a semantic caching mechanism that dynamically stores the transmitted semantic vectors in the local caching memory of both the transmitter and receiver. The cached semantic vectors are then exploited to eliminate the need to transmit similar codes in subsequent transmission, thus further reducing communication overhead. Simulation results highlight the evolving performance of the proposed system in terms of transmission efficiency, achieving superior perceptual quality with an average bandwidth compression ratio (BCR) of 1/192 for a sequence of 100 testing images compared to DeepJSCC and Inverse JSCC with the same BCR. Code of this paper is available at \url{https://github.com/recusant7/GAN_SeCom}. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.03100 [pdf, other]

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility, and achieves on-par quality with human recordings. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data. △ Less

Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

arXiv:2402.06841 [pdf]

Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA

Authors: Shaojie Tang, Penpen Miao, Xingyu Gao, Yu Zhong, Dantong Zhu, Haixing Wen, Zhihui Xu, Qiuyue Wei, Hongping Yao, Xin Huang, Rui Gao, Chen Zhao, Weihua Zhou

Abstract: A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point c… ▽ More A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point clouds of the LV epicardial contours (LVECs). Secondly, according to the characteristics of cardiac anatomy, the special points of anterior and posterior interventricular grooves (APIGs) were manually marked in both SPECT and CTA image volumes. Thirdly, we developed an in-house program for coarsely registering the special points of APIGs to ensure a correct cardiac orientation alignment between SPECT and CTA images. Fourthly, we employed ICP, SICP or CPD algorithm to achieve a fine registration for the point clouds (together with the special points of APIGs) of the LV epicardial surfaces (LVERs) in SPECT and CTA images. Finally, the image fusion between SPECT and CTA was realized after the fine registration. The experimental results showed that the cardiac orientation was aligned well and the mean distance error of the optimal registration method (CPD with affine transform) was consistently less than 3 mm. The proposed method could effectively fuse the structures from cardiac CTA and SPECT functional images, and demonstrated a potential in assisting in accurate diagnosis of cardiac diseases by combining complementary advantages of the two imaging modalities. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2308.05262 [pdf, other]

Robust Interference Mitigation techniques for Direct Position Estimation

Authors: Haoqing Li, Shuo Tang, Peng Wu, Pau Closas

Abstract: Global Navigation Satellite System (GNSS) is pervasive in navigation and positioning applications, where precise position and time referencing estimations are required. Conventional methods for GNSS positioning involve a two-step process, where intermediate measurements such as Doppler shift and time delay of received GNSS signals are computed and then used to solve for the receiver's position. Al… ▽ More Global Navigation Satellite System (GNSS) is pervasive in navigation and positioning applications, where precise position and time referencing estimations are required. Conventional methods for GNSS positioning involve a two-step process, where intermediate measurements such as Doppler shift and time delay of received GNSS signals are computed and then used to solve for the receiver's position. Alternatively, Direct Position Estimation (DPE) was proposed to infer the position directly from the sampled signal without intermediate variables, yielding to superior levels of sensitivity and operation under challenging environments. However, the positioning resilience of DPE method is still under the threat of various interferences. Robust Interference Mitigation (RIM) processing has been studied and proved to be efficient against various interference in conventional two-step positioning (2SP) methods, and therefore worthy to be explored regarding its potential to enhance DPE. This article extends DPE methodology by incorporating RIM strategies that address the increasing need to protect GNSS receivers against intentional or unintentional interferences, such as jamming signals, which can deny GNSS-based positioning. RIM, which leverages robust statistics, was shown to provide competitive results in two-step approaches and is here employed in a high-sensitivity DPE framework with successful results. The article also provides a quantification of the loss of efficiency of using RIM when no interference is present and validates the proposed methodology on relevant interference cases, while the approach can be used to mitigate other common interference signals. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2307.09707 [pdf, other]

Improved Label Design for Timing Synchronization in OFDM Systems against Multi-path Uncertainty

Authors: Chaojin Qing, Shuhai Tang, Na Yang, Chuangui Rao, Jiafan Wang

Abstract: Timing synchronization (TS) is vital for orthogonal frequency division multiplexing (OFDM) systems, which makes the discrete Fourier transform (DFT) window start at the inter-symbol-interference (ISI)-free region. However, the multi-path uncertainty in wireless communication scenarios degrades the TS correctness. To alleviate this degradation, we propose a learning-based TS method enhanced by impr… ▽ More Timing synchronization (TS) is vital for orthogonal frequency division multiplexing (OFDM) systems, which makes the discrete Fourier transform (DFT) window start at the inter-symbol-interference (ISI)-free region. However, the multi-path uncertainty in wireless communication scenarios degrades the TS correctness. To alleviate this degradation, we propose a learning-based TS method enhanced by improving the design of training label. In the proposed method, the classic cross-correlator extracts the initial TS feature for benefiting the following machine learning. Wherein, the network architecture unfolds one classic cross-correlation process. Against the multi-path uncertainty, a novel training label is designed by representing the ISI-free region and especially highlighting its approximate midpoint. Therein, the closer to the region boundary of ISI-free the smaller label values are set, expecting to locate the maximum network output in ISI-free region with a high probability. Then, to guarantee the correctness of labeling, we exploit the priori information of line-of-sight (LOS) to form a LOS-aided labeling. Numerical results confirm that, the proposed training label effectively enhances the correctness of the proposed TS learner against the multi-path uncertainty. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 5 pages, 5 figures

arXiv:2307.00217 [pdf, other]

Metric Learning-Based Timing Synchronization by Using Lightweight Neural Network

Authors: Chaojin Qing, Na Yang, Shuhai Tang, Chuangui Rao, Jiafan Wang, Hui Lin

Abstract: Timing synchronization (TS) is one of the key tasks in orthogonal frequency division multiplexing (OFDM) systems. However, multi-path uncertainty corrupts the TS correctness, making OFDM systems suffer from a severe inter-symbol-interference (ISI). To tackle this issue, we propose a timing-metric learning-based TS method assisted by a lightweight one-dimensional convolutional neural network (1-D C… ▽ More Timing synchronization (TS) is one of the key tasks in orthogonal frequency division multiplexing (OFDM) systems. However, multi-path uncertainty corrupts the TS correctness, making OFDM systems suffer from a severe inter-symbol-interference (ISI). To tackle this issue, we propose a timing-metric learning-based TS method assisted by a lightweight one-dimensional convolutional neural network (1-D CNN). Specifically, the receptive field of 1-D CNN is specifically designed to extract the metric features from the classic synchronizer. Then, to combat the multi-path uncertainty, we employ the varying delays and gains of multi-path (the characteristics of multi-path uncertainty) to design the timing-metric objective, and thus form the training labels. This is typically different from the existing timing-metric objectives with respect to the timing synchronization point. Our method substantively increases the completeness of training data against the multi-path uncertainty due to the complete preservation of metric information. By this mean, the TS correctness is improved against the multi-path uncertainty. Numerical results demonstrate the effectiveness and generalization of the proposed TS method against the multi-path uncertainty. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: 4 pages, 3 figures

arXiv:2306.17570 [pdf, other]

ELM-based Timing Synchronization for OFDM Systems by Exploiting Computer-aided Training Strategy

Authors: Mintao Zhang, Shuhai Tang, Chaojin Qing, Na Yang, Xi Cai, Jiafan Wang

Abstract: Due to the implementation bottleneck of training data collection in realistic wireless communications systems, supervised learning-based timing synchronization (TS) is challenged by the incompleteness of training data. To tackle this bottleneck, we extend the computer-aided approach, with which the local device can generate the training data instead of generating learning labels from the received… ▽ More Due to the implementation bottleneck of training data collection in realistic wireless communications systems, supervised learning-based timing synchronization (TS) is challenged by the incompleteness of training data. To tackle this bottleneck, we extend the computer-aided approach, with which the local device can generate the training data instead of generating learning labels from the received samples collected in realistic systems, and then construct an extreme learning machine (ELM)-based TS network in orthogonal frequency division multiplexing (OFDM) systems. Specifically, by leveraging the rough information of channel impulse responses (CIRs), i.e., root-mean-square (r.m.s) delay, we propose the loose constraint-based and flexible constraint-based training strategies for the learning-label design against the maximum multi-path delay. The underlying mechanism is to improve the completeness of multi-path delays that may appear in the realistic wireless channels and thus increase the statistical efficiency of the designed TS learner. By this means, the proposed ELM-based TS network can alleviate the degradation of generalization performance. Numerical results reveal the robustness and generalization of the proposed scheme against varying parameters. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 12 pages, 7 figures,

arXiv:2306.08728 [pdf, other]

Towards trustworthy seizure onset detection using workflow notes

Authors: Khaled Saab, Siyi Tang, Mohamed Taha, Christopher Lee-Messer, Christopher Ré, Daniel Rubin

Abstract: A major barrier to deploying healthcare AI models is their trustworthiness. One form of trustworthiness is a model's robustness across different subgroups: while existing models may exhibit expert-level performance on aggregate metrics, they often rely on non-causal features, leading to errors in hidden subgroups. To take a step closer towards trustworthy seizure onset detection from EEG, we propo… ▽ More A major barrier to deploying healthcare AI models is their trustworthiness. One form of trustworthiness is a model's robustness across different subgroups: while existing models may exhibit expert-level performance on aggregate metrics, they often rely on non-causal features, leading to errors in hidden subgroups. To take a step closer towards trustworthy seizure onset detection from EEG, we propose to leverage annotations that are produced by healthcare personnel in routine clinical workflows -- which we refer to as workflow notes -- that include multiple event descriptions beyond seizures. Using workflow notes, we first show that by scaling training data to an unprecedented level of 68,920 EEG hours, seizure onset detection performance significantly improves (+12.3 AUROC points) compared to relying on smaller training sets with expensive manual gold-standard labels. Second, we reveal that our binary seizure onset detection model underperforms on clinically relevant subgroups (e.g., up to a margin of 6.5 AUROC points between pediatrics and adults), while having significantly higher false positives on EEG clips showing non-epileptiform abnormalities compared to any EEG clip (+19 FPR points). To improve model robustness to hidden subgroups, we train a multilabel model that classifies 26 attributes other than seizures, such as spikes, slowing, and movement artifacts. We find that our multilabel model significantly improves overall seizure onset detection performance (+5.9 AUROC points) while greatly improving performance among subgroups (up to +8.3 AUROC points), and decreases false positives on non-epileptiform abnormalities by 8 FPR points. Finally, we propose a clinical utility metric based on false positives per 24 EEG hours and find that our multilabel model improves this clinical utility metric by a factor of 2x across different clinical settings. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2303.06877 [pdf, other]

Progressive Open Space Expansion for Open-Set Model Attribution

Authors: Tianyun Yang, Danding Wang, Fan Tang, Xinying Zhao, Juan Cao, Sheng Tang

Abstract: Despite the remarkable progress in generative technology, the Janus-faced issues of intellectual property protection and malicious content supervision have arisen. Efforts have been paid to manage synthetic images by attributing them to a set of potential source models. However, the closed-set classification setting limits the application in real-world scenarios for handling contents generated by… ▽ More Despite the remarkable progress in generative technology, the Janus-faced issues of intellectual property protection and malicious content supervision have arisen. Efforts have been paid to manage synthetic images by attributing them to a set of potential source models. However, the closed-set classification setting limits the application in real-world scenarios for handling contents generated by arbitrary models. In this study, we focus on a challenging task, namely Open-Set Model Attribution (OSMA), to simultaneously attribute images to known models and identify those from unknown ones. Compared to existing open-set recognition (OSR) tasks focusing on semantic novelty, OSMA is more challenging as the distinction between images from known and unknown models may only lie in visually imperceptible traces. To this end, we propose a Progressive Open Space Expansion (POSE) solution, which simulates open-set samples that maintain the same semantics as closed-set samples but embedded with different imperceptible traces. Guided by a diversity constraint, the open space is simulated progressively by a set of lightweight augmentation models. We consider three real-world scenarios and construct an OSMA benchmark dataset, including unknown models trained with different random seeds, architectures, and datasets from known ones. Extensive experiments on the dataset demonstrate POSE is superior to both existing model attribution methods and off-the-shelf OSR methods. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: accepted to CVPR2023

arXiv:2303.06221 [pdf, other]

Indirect Adaptive Optimal Control in the Presence of Input Saturation

Authors: Sunbochen Tang, Anuradha M. Annaswamy

Abstract: In this paper, we propose a combined Magnitude Saturated Adaptive Control (MSAC)-Model Predictive Control (MPC) approach to linear quadratic tracking optimal control problems with parametric uncertainties and input saturation. The proposed MSAC-MPC approach first focuses on a stable solution and parameter estimation, and switches to MPC when parameter learning is accomplished. We show that the MSA… ▽ More In this paper, we propose a combined Magnitude Saturated Adaptive Control (MSAC)-Model Predictive Control (MPC) approach to linear quadratic tracking optimal control problems with parametric uncertainties and input saturation. The proposed MSAC-MPC approach first focuses on a stable solution and parameter estimation, and switches to MPC when parameter learning is accomplished. We show that the MSAC, based on a high-order tuner, leads to parameter convergence to true values while providing stability guarantees. We also show that after switching to MPC, the optimality gap is well-defined and proportional to the parameter estimation error. We demonstrate the effectiveness of the proposed MSAC-MPC algorithm through a numerical example based on a linear second-order, two input, unstable system. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2303.00160 [pdf, ps, other]

On Parametric Misspecified Bayesian Cramér-Rao bound: An application to linear Gaussian systems

Authors: Shuo Tang, Gerald LaMountain, Tales Imbiriba, Pau Closas

Abstract: A lower bound is an important tool for predicting the performance that an estimator can achieve under a particular statistical model. Bayesian bounds are a kind of such bounds which not only utilizes the observation statistics but also includes the prior model information. In reality, however, the true model generating the data is either unknown or simplified when deriving estimators, which motiva… ▽ More A lower bound is an important tool for predicting the performance that an estimator can achieve under a particular statistical model. Bayesian bounds are a kind of such bounds which not only utilizes the observation statistics but also includes the prior model information. In reality, however, the true model generating the data is either unknown or simplified when deriving estimators, which motivates the works to derive estimation bounds under modeling mismatch situations. This paper provides a derivation of a Bayesian Cramér-Rao bound under model misspecification, defining important concepts such as pseudotrue parameter that were not clearly identified in previous works. The general result is particularized in linear and Gaussian problems, where closed-forms are available and results are used to validate the results. △ Less

Submitted 28 February, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.13921 [pdf, other]

Autonomous Polycrystalline Material Decomposition for Hyperspectral Neutron Tomography

Authors: Mohammad Samin Nur Chowdhury, Diyu Yang, Shimin Tang, Singanallur V. Venkatakrishnan, Hassina Z. Bilheux, Gregery T. Buzzard, Charles A. Bouman

Abstract: Hyperspectral neutron tomography is an effective method for analyzing crystalline material samples with complex compositions in a non-destructive manner. Since the counts in the hyperspectral neutron radiographs directly depend on the neutron cross-sections, materials may exhibit contrasting neutron responses across wavelengths. Therefore, it is possible to extract the unique signatures associated… ▽ More Hyperspectral neutron tomography is an effective method for analyzing crystalline material samples with complex compositions in a non-destructive manner. Since the counts in the hyperspectral neutron radiographs directly depend on the neutron cross-sections, materials may exhibit contrasting neutron responses across wavelengths. Therefore, it is possible to extract the unique signatures associated with each material and use them to separate the crystalline phases simultaneously. We introduce an autonomous material decomposition (AMD) algorithm to automatically characterize and localize polycrystalline structures using Bragg edges with contrasting neutron responses from hyperspectral data. The algorithm estimates the linear attenuation coefficient spectra from the measured radiographs and then uses these spectra to perform polycrystalline material decomposition and reconstructs 3D material volumes to localize materials in the spatial domain. Our results demonstrate that the method can accurately estimate both the linear attenuation coefficient spectra and associated reconstructions on both simulated and experimental neutron data. △ Less

Submitted 21 August, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.12397 [pdf, other]

Cascaded ELM-based Joint Frame Synchronization and Channel Estimation over Rician Fading Channel with Hardware Imperfections

Authors: Chaojin Qing, Chuangui Rao, Shuhai Tang, Na Yang, Jiafan Wang

Abstract: Due to the interdependency of frame synchronization (FS) and channel estimation (CE), joint FS and CE (JFSCE) schemes are proposed to enhance their functionalities and therefore boost the overall performance of wireless communication systems. Although traditional JFSCE schemes alleviate the influence between FS and CE, they show deficiencies in dealing with hardware imperfection (HI) and determini… ▽ More Due to the interdependency of frame synchronization (FS) and channel estimation (CE), joint FS and CE (JFSCE) schemes are proposed to enhance their functionalities and therefore boost the overall performance of wireless communication systems. Although traditional JFSCE schemes alleviate the influence between FS and CE, they show deficiencies in dealing with hardware imperfection (HI) and deterministic line-of-sight (LOS) path. To tackle this challenge, we proposed a cascaded ELM-based JFSCE to alleviate the influence of HI in the scenario of the Rician fading channel. Specifically, the conventional JFSCE method is first employed to extract the initial features, and thus forms the non-Neural Network (NN) solutions for FS and CE, respectively. Then, the ELM-based networks, named FS-NET and CE-NET, are cascaded to capture the NN solutions of FS and CE. Simulation and analysis results show that, compared with the conventional JFSCE methods, the proposed cascaded ELM-based JFSCE significantly reduces the error probability of FS and the normalized mean square error (NMSE) of CE, even against the impacts of parameter variations. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 12 pages, 9 figures

arXiv:2301.10015 [pdf, other]

Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics

Authors: Gurunath Reddy M, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang

Abstract: We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans. Most importantly, a deep neural lyrics-to-melody net is trained in an encoder-decoder way to predict possible pairs of lyrics-melody when given incomplete lyrics (few keywords). The attention mechanism is… ▽ More We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans. Most importantly, a deep neural lyrics-to-melody net is trained in an encoder-decoder way to predict possible pairs of lyrics-melody when given incomplete lyrics (few keywords). The attention mechanism is exploited to align the predicted lyrics with the melody during the lyrics-to-melody generation. The qualitative and quantitative evaluation metrics reveal that the proposed method is indeed capable of generating proper lyrics and corresponding melody for composing new songs given a piece of incomplete seed lyrics. △ Less

Submitted 22 January, 2023; originally announced January 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2011.06380

arXiv:2212.02947 [pdf, other]

CNN-based Timing Synchronization for OFDM Systems Assisted by Initial Path Acquisition in Frequency Selective Fading Channel

Authors: Chaojin Qing, Na Yang, Shuhai Tang, Chuangui Rao, Jiafan Wang, Jinliang Chen

Abstract: Multi-path fading seriously affects the accuracy of timing synchronization (TS) in orthogonal frequency division multiplexing (OFDM) systems. To tackle this issue, we propose a convolutional neural network (CNN)-based TS scheme assisted by initial path acquisition in this paper. Specifically, the classic cross-correlation method is first employed to estimate a coarse timing offset and capture an i… ▽ More Multi-path fading seriously affects the accuracy of timing synchronization (TS) in orthogonal frequency division multiplexing (OFDM) systems. To tackle this issue, we propose a convolutional neural network (CNN)-based TS scheme assisted by initial path acquisition in this paper. Specifically, the classic cross-correlation method is first employed to estimate a coarse timing offset and capture an initial path, which shrinks the TS search region. Then, a one-dimensional (1-D) CNN is developed to optimize the TS of OFDM systems. Due to the narrowed search region of TS, the CNN-based TS effectively locates the accurate TS point and inspires us to construct a lightweight network in terms of computational complexity and online running time. Compared with the compressed sensing-based TS method and extreme learning machine-based TS method, simulation results show that the proposed method can effectively improve the TS performance with the reduced computational complexity and online running time. Besides, the proposed TS method presents robustness against the variant parameters of multi-path fading channels. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: 5 pages, 3 figures

arXiv:2212.00647 [pdf, other]

An Edge Alignment-based Orientation Selection Method for Neutron Tomography

Authors: Diyu Yang, Shimin Tang, Singanallur V. Venkatakrishnan, Mohammad S. N. Chowdhury, Yuxuan Zhang, Hassina Z. Bilheux, Gregery T. Buzzard, Charles A. Bouman

Abstract: Neutron computed tomography (nCT) is a 3D characterization technique used to image the internal morphology or chemical composition of samples in biology and materials sciences. A typical workflow involves placing the sample in the path of a neutron beam, acquiring projection data at a predefined set of orientations, and processing the resulting data using an analytic reconstruction algorithm. Typi… ▽ More Neutron computed tomography (nCT) is a 3D characterization technique used to image the internal morphology or chemical composition of samples in biology and materials sciences. A typical workflow involves placing the sample in the path of a neutron beam, acquiring projection data at a predefined set of orientations, and processing the resulting data using an analytic reconstruction algorithm. Typical nCT scans require hours to days to complete and are then processed using conventional filtered back-projection (FBP), which performs poorly with sparse views or noisy data. Hence, the main methods in order to reduce overall acquisition time are the use of an improved sampling strategy combined with the use of advanced reconstruction methods such as model-based iterative reconstruction (MBIR). In this paper, we propose an adaptive orientation selection method in which an MBIR reconstruction on previously-acquired measurements is used to define an objective function on orientations that balances a data-fitting term promoting edge alignment and a regularization term promoting orientation diversity. Using simulated and experimental data, we demonstrate that our method produces high-quality reconstructions using significantly fewer total measurements than the conventional approach. △ Less

Submitted 8 March, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.11176 [pdf, other]

Modeling Multivariate Biosignals With Graph Neural Networks and Structured State Space Models

Authors: Siyi Tang, Jared A. Dunnmon, Liangqiong Qu, Khaled K. Saab, Tina Baykaner, Christopher Lee-Messer, Daniel L. Rubin

Abstract: Multivariate biosignals are prevalent in many medical domains, such as electroencephalography, polysomnography, and electrocardiography. Modeling spatiotemporal dependencies in multivariate biosignals is challenging due to (1) long-range temporal dependencies and (2) complex spatial correlations between the electrodes. To address these challenges, we propose representing multivariate biosignals as… ▽ More Multivariate biosignals are prevalent in many medical domains, such as electroencephalography, polysomnography, and electrocardiography. Modeling spatiotemporal dependencies in multivariate biosignals is challenging due to (1) long-range temporal dependencies and (2) complex spatial correlations between the electrodes. To address these challenges, we propose representing multivariate biosignals as time-dependent graphs and introduce GraphS4mer, a general graph neural network (GNN) architecture that improves performance on biosignal classification tasks by modeling spatiotemporal dependencies in biosignals. Specifically, (1) we leverage the Structured State Space architecture, a state-of-the-art deep sequence model, to capture long-range temporal dependencies in biosignals and (2) we propose a graph structure learning layer in GraphS4mer to learn dynamically evolving graph structures in the data. We evaluate our proposed model on three distinct biosignal classification tasks and show that GraphS4mer consistently improves over existing models, including (1) seizure detection from electroencephalographic signals, outperforming a previous GNN with self-supervised pre-training by 3.1 points in AUROC; (2) sleep staging from polysomnographic signals, a 4.1 points improvement in macro-F1 score compared to existing sleep staging models; and (3) 12-lead electrocardiogram classification, outperforming previous state-of-the-art models by 2.7 points in macro-F1 score. △ Less

Submitted 29 April, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: Published as a conference paper at CHIL 2023

arXiv:2210.10648 [pdf]

Preliminary Analysis of Channel Capacity in Air to ground LoS MIMO Communication Based on A Cloud Modeling Method

Authors: Ning Wei, Shuangqing Tang, Zeyuan Zhang

Abstract: Since the orthogonality of the line-of-sight multiple input multiple output (LoS MIMO) channel is only available within the Rayleigh distance, coverage of communication systems is restricted due to the finite implementation spacing of antennas. However, media with different permittivity in the transmission path are likely to loosen the requirement for antenna spacing. Such a conclusion could be en… ▽ More Since the orthogonality of the line-of-sight multiple input multiple output (LoS MIMO) channel is only available within the Rayleigh distance, coverage of communication systems is restricted due to the finite implementation spacing of antennas. However, media with different permittivity in the transmission path are likely to loosen the requirement for antenna spacing. Such a conclusion could be enlightening in an air-to-ground LoS MIMO scenario considering the existence of clouds in the troposphere. To analyze the random phase variations in the presence of a single-layer cloud, we propose and modify a new cloud modeling method fit for LoS MIMO scene based on real-measurement data. Then, the preliminary analysis of channel capacity is conducted based on the simulation result. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 14 pages

arXiv:2210.05762 [pdf, other]

Joint localization and classification of breast tumors on ultrasound images using a novel auxiliary attention-based framework

Authors: Zong Fan, Ping Gong, Shanshan Tang, Christine U. Lee, Xiaohui Zhang, Pengfei Song, Shigao Chen, Hua Li

Abstract: Automatic breast lesion detection and classification is an important task in computer-aided diagnosis, in which breast ultrasound (BUS) imaging is a common and frequently used screening tool. Recently, a number of deep learning-based methods have been proposed for joint localization and classification of breast lesions using BUS images. In these methods, features extracted by a shared network trun… ▽ More Automatic breast lesion detection and classification is an important task in computer-aided diagnosis, in which breast ultrasound (BUS) imaging is a common and frequently used screening tool. Recently, a number of deep learning-based methods have been proposed for joint localization and classification of breast lesions using BUS images. In these methods, features extracted by a shared network trunk are appended by two independent network branches to achieve classification and localization. Improper information sharing might cause conflicts in feature optimization in the two branches and leads to performance degradation. Also, these methods generally require large amounts of pixel-level annotated data for model training. To overcome these limitations, we proposed a novel joint localization and classification model based on the attention mechanism and disentangled semi-supervised learning strategy. The model used in this study is composed of a classification network and an auxiliary lesion-aware network. By use of the attention mechanism, the auxiliary lesion-aware network can optimize multi-scale intermediate feature maps and extract rich semantic information to improve classification and localization performance. The disentangled semi-supervised learning strategy only requires incomplete training datasets for model training. The proposed modularized framework allows flexible network replacement to be generalized for various applications. Experimental results on two different breast ultrasound image datasets demonstrate the effectiveness of the proposed method. The impacts of various network factors on model performance are also investigated to gain deep insights into the designed framework. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2209.06451 [pdf, other]

Lightweight 1-D CNN-based Timing Synchronization for OFDM Systems with CIR Uncertainty

Authors: Chaojin Qing, Shuhai Tang, Xi Cai, Jiafan Wang

Abstract: In this letter, a lightweight one-dimensional convolutional neural network (1-D CNN)-based timing synchronization (TS) method is proposed to reduce the computational complexity and processing delay and hold the timing accuracy in orthogonal frequency division multiplexing (OFDM) systems. Specifically, the TS task is first transformed into a deep learning (DL)-based classification task, and then th… ▽ More In this letter, a lightweight one-dimensional convolutional neural network (1-D CNN)-based timing synchronization (TS) method is proposed to reduce the computational complexity and processing delay and hold the timing accuracy in orthogonal frequency division multiplexing (OFDM) systems. Specifically, the TS task is first transformed into a deep learning (DL)-based classification task, and then three iterations of the compressed sensing (CS)-based TS strategy are simplified to form a lightweight network, whose CNN layers are specially designed to highlight the classification features. Besides, to enhance the generalization performance of the proposed method against the channel impulse responses (CIR) uncertainty, the relaxed restriction for propagation delay is exploited to augment the completeness of training data. Numerical results reflect that the proposed 1-D CNN-based TS method effectively improves the TS accuracy, reduces the computational complexity and processing delay, and possesses a good generalization performance against the CIR uncertainty. The source codes of the proposed method are available at https://github.com/qingchj851/CNNTS. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: 5 pages, 5 figures

arXiv:2207.04211 [pdf, other]

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

Authors: Wenqiao Zhang, Jiannan Guo, Mengze Li, Haochen Shi, Shengyu Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

Abstract: Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text, which potentially impacts a wide variety of real-world applications, such as internet search and fashion retrieval. In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding la… ▽ More Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text, which potentially impacts a wide variety of real-world applications, such as internet search and fashion retrieval. In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding language expressly requests new traits on how specific characteristics of the query image should be modified in order to get the intended target image. This task is challenging since it necessitates learning and understanding the composite image-text representation by incorporating cross-granular semantic updates. In this paper, we tackle this task by a novel \underline{\textbf{B}}ottom-up cr\underline{\textbf{O}}ss-modal \underline{\textbf{S}}emantic compo\underline{\textbf{S}}ition (\textbf{BOSS}) with Hybrid Counterfactual Training framework, which sheds new light on the CIR task by studying it from two previously overlooked perspectives: \emph{implicitly bottom-up composition of visiolinguistic representation} and \emph{explicitly fine-grained correspondence of query-target construction}. On the one hand, we leverage the implicit interaction and composition of cross-modal embeddings from the bottom local characteristics to the top global semantics, preserving and transforming the visual representation conditioned on language semantics in several continuous steps for effective target image search. On the other hand, we devise a hybrid counterfactual training strategy that can reduce the model's ambiguity for similar queries. △ Less

Submitted 9 July, 2022; originally announced July 2022.

arXiv:2206.03603 [pdf]

doi 10.1016/j.compbiomed.2023.106954

A new method incorporating deep learning with shape priors for left ventricular segmentation in myocardial perfusion SPECT images

Authors: Fubao Zhu, Jinyu Zhao, Chen Zhao, Shaojie Tang, Jiaofen Nan, Yanting Li, Zhongqiang Zhao, Jianzhou Shi, Zenghong Chen, Zhixin Jiang, Weihua Zhou

Abstract: Background: The assessment of left ventricular (LV) function by myocardial perfusion SPECT (MPS) relies on accurate myocardial segmentation. The purpose of this paper is to develop and validate a new method incorporating deep learning with shape priors to accurately extract the LV myocardium for automatic measurement of LV functional parameters. Methods: A segmentation architecture that integrates… ▽ More Background: The assessment of left ventricular (LV) function by myocardial perfusion SPECT (MPS) relies on accurate myocardial segmentation. The purpose of this paper is to develop and validate a new method incorporating deep learning with shape priors to accurately extract the LV myocardium for automatic measurement of LV functional parameters. Methods: A segmentation architecture that integrates a three-dimensional (3D) V-Net with a shape deformation module was developed. Using the shape priors generated by a dynamic programming (DP) algorithm, the model output was then constrained and guided during the model training for quick convergence and improved performance. A stratified 5-fold cross-validation was used to train and validate our models. Results: Results of our proposed method agree well with those from the ground truth. Our proposed model achieved a Dice similarity coefficient (DSC) of 0.9573(0.0244), 0.9821(0.0137), and 0.9903(0.0041), a Hausdorff distances (HD) of 6.7529(2.7334) mm, 7.2507(3.1952) mm, and 7.6121(3.0134) mm in extracting the endocardium, myocardium, and epicardium, respectively. Conclusion: Our proposed method achieved a high accuracy in extracting LV myocardial contours and assessing LV function. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: 21 pages, 14 figures

arXiv:2205.14833 [pdf, other]

Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

Authors: Chengfei Lv, Chaoyue Niu, Renjie Gu, Xiaotang Jiang, Zhaode Wang, Bin Liu, Ziqi Wu, Qiulin Yao, Congyu Huang, Panos Huang, Tao Huang, Hui Shu, Jinde Song, Bin Zou, Peng Lan, Guohuan Xu, Fei Wu, Shaojie Tang, Fan Wu, Guihai Chen

Abstract: To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a c… ▽ More To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a cross-platform and high-performance execution environment, while facilitating daily task iteration. Specifically, the compute container is based on Mobile Neural Network (MNN), a tensor compute engine along with the data processing and model execution libraries, which are exposed through a refined Python thread-level virtual machine (VM) to support diverse ML tasks and concurrent task execution. The core of MNN is the novel mechanisms of operator decomposition and semi-auto search, sharply reducing the workload in manually optimizing hundreds of operators for tens of hardware backends and further quickly identifying the best backend with runtime optimization for a computation graph. The data pipeline introduces an on-device stream processing framework to enable processing user behavior data at source. The deployment platform releases ML tasks with an efficient push-then-pull method and supports multi-granularity deployment policies. We evaluate Walle in practical e-commerce application scenarios to demonstrate its effectiveness, efficiency, and scalability. Extensive micro-benchmarks also highlight the superior performance of MNN and the Python thread-level VM. Walle has been in large-scale production use in Alibaba, while MNN has been open source with a broad impact in the community. △ Less

Submitted 29 May, 2022; originally announced May 2022.

Comments: Accepted by OSDI 2022

arXiv:2205.12501 [pdf, ps, other]

Using Loaded N-port Structures to Achieve the Continuous-Space Electromagnetic Channel Capacity Bound

Authors: Zixiang Han, Shanpu Shen, Yujie Zhang, Shiwen Tang, Chi-Yuk Chiu, Ross Murch

Abstract: A method for achieving the continuous-space electromagnetic channel capacity bound using loaded N-port structures is described. It is relevant for the design of compact multiple-input multiple-output (MIMO) antennas that can achieve channel capacity bounds when constrained by size. The method is not restricted to a specific antenna configuration and a closed-form expression for the channel capacit… ▽ More A method for achieving the continuous-space electromagnetic channel capacity bound using loaded N-port structures is described. It is relevant for the design of compact multiple-input multiple-output (MIMO) antennas that can achieve channel capacity bounds when constrained by size. The method is not restricted to a specific antenna configuration and a closed-form expression for the channel capacity limits are provided with various constraints. Furthermore, using loaded N-port structures to represent arbitrary antenna geometries, an efficient optimization approach is proposed for finding the optimum MIMO antenna design that achieves the channel capacity bounds. Simulation results of the channel capacity bounds achieved using our MIMO antenna design with one square wavelength size are provided. These show that at least 18 ports can be supported in one square wavelength and achieve the continuous-space electromagnetic channel capacity bound. The results demonstrate that our method can link continuous-space electromagnetic channel capacity bounds to MIMO antenna design. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2202.04542 [pdf, other]

Spectrally Adaptive Common Spatial Patterns

Authors: Mahta Mousavi, Eric Lybrand, Shuangquan Feng, Shuai Tang, Rayan Saab, Virginia de Sa

Abstract: The method of Common Spatial Patterns (CSP) is widely used for feature extraction of electroencephalography (EEG) data, such as in motor imagery brain-computer interface (BCI) systems. It is a data-driven method estimating a set of spatial filters so that the power of the filtered EEG signal is maximized for one motor imagery class and minimized for the other. This method, however, is prone to ove… ▽ More The method of Common Spatial Patterns (CSP) is widely used for feature extraction of electroencephalography (EEG) data, such as in motor imagery brain-computer interface (BCI) systems. It is a data-driven method estimating a set of spatial filters so that the power of the filtered EEG signal is maximized for one motor imagery class and minimized for the other. This method, however, is prone to overfitting and is known to suffer from poor generalization especially with limited calibration data. Additionally, due to the high heterogeneity in brain data and the non-stationarity of brain activity, CSP is usually trained for each user separately resulting in long calibration sessions or frequent re-calibrations that are tiring for the user. In this work, we propose a novel algorithm called Spectrally Adaptive Common Spatial Patterns (SACSP) that improves CSP by learning a temporal/spectral filter for each spatial filter so that the spatial filters are concentrated on the most relevant temporal frequencies for each user. We show the efficacy of SACSP in providing better generalizability and higher classification accuracy from calibration to online control compared to existing methods. Furthermore, we show that SACSP provides neurophysiologically relevant information about the temporal frequencies of the filtered signals. Our results highlight the differences in the motor imagery signal among BCI users as well as spectral differences in the signals generated for each class, and show the importance of learning robust user-specific features in a data-driven manner. △ Less

Submitted 9 February, 2022; originally announced February 2022.

arXiv:2201.00100 [pdf, other]

doi 10.1109/TIP.2021.3139232

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

Authors: Xiaoqiang Wang, Lei Zhu, Siliang Tang, Huazhu Fu, Ping Li, Fei Wu, Yi Yang, Yueting Zhuang

Abstract: Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a… ▽ More Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map. △ Less

Submitted 31 December, 2021; originally announced January 2022.

Comments: Accepted by IEEE TIP

arXiv:2112.11611 [pdf, ps, other]

Continuous Optimization-Based Drift Counteraction Optimal Control: A Spacecraft Attitude Control Case Study

Authors: Sunbochen Tang, Nan Li, Robert A. E. Zidek, Ilya Kolmanovsky

Abstract: This paper presents a continuous optimization approach to DCOC and its application to spacecraft high-precision attitude control. The approach computes a control input sequence that maximizes the time-before-exit by solving a nonlinear programming problem with an exponentially weighted cost function and purely continuous variables. Based on results from sensitivity analysis and exact penalty metho… ▽ More This paper presents a continuous optimization approach to DCOC and its application to spacecraft high-precision attitude control. The approach computes a control input sequence that maximizes the time-before-exit by solving a nonlinear programming problem with an exponentially weighted cost function and purely continuous variables. Based on results from sensitivity analysis and exact penalty method, we prove the optimality guarantee of our approach. The practical application of our approach is demonstrated through a spacecraft high-precision attitude control example. A nominal case with three functional reaction wheels (RWs) and an underactuated case with only two functional RWs were considered. Simulation results illustrate the effectiveness of our approach as a contingency method for extending spacecraft's effective mission time in the case of RW failures. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: Submitted to the AIAA Journal of Guidance, Control, and Dynamics as an Engineering Note

arXiv:2107.13177 [pdf, other]

Label Design-based ELM Network for Timing Synchronization in OFDM Systems with Nonlinear Distortion

Authors: Chaojin Qing, Shuhai Tang, Chuangui Rao, Qing Ye, Jiafan Wang, Chuan Huang

Abstract: Due to the nonlinear distortion in Orthogonal frequency division multiplexing (OFDM) systems, the timing synchronization (TS) performance is inevitably degraded at the receiver. To relieve this issue, an extreme learning machine (ELM)-based network with a novel learning label is proposed to the TS of OFDM system in our work and increases the possibility of symbol timing offset (STO) estimation res… ▽ More Due to the nonlinear distortion in Orthogonal frequency division multiplexing (OFDM) systems, the timing synchronization (TS) performance is inevitably degraded at the receiver. To relieve this issue, an extreme learning machine (ELM)-based network with a novel learning label is proposed to the TS of OFDM system in our work and increases the possibility of symbol timing offset (STO) estimation residing in inter-symbol interference (ISI)-free region. Especially, by exploiting the prior information of the ISI-free region, two types of learning labels are developed to facilitate the ELM-based TS network. With designed learning labels, a timing-processing by classic TS scheme is first executed to capture the coarse timing metric (TM) and then followed by an ELM network to refine the TM. According to experiments and analysis, our scheme shows its effectiveness in the improvement of TS performance and reveals its generalization performance in different training and testing channel scenarios. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: 5 pages, 6 figures, VTC2021

arXiv:2107.04721 [pdf, other]

doi 10.1007/978-3-030-87000-3_7

U-Net with Hierarchical Bottleneck Attention for Landmark Detection in Fundus Images of the Degenerated Retina

Authors: Shuyun Tang, Ziming Qi, Jacob Granley, Michael Beyeler

Abstract: Fundus photography has routinely been used to document the presence and severity of retinal degenerative diseases such as age-related macular degeneration (AMD), glaucoma, and diabetic retinopathy (DR) in clinical practice, for which the fovea and optic disc (OD) are important retinal landmarks. However, the occurrence of lesions, drusen, and other retinal abnormalities during retinal degeneration… ▽ More Fundus photography has routinely been used to document the presence and severity of retinal degenerative diseases such as age-related macular degeneration (AMD), glaucoma, and diabetic retinopathy (DR) in clinical practice, for which the fovea and optic disc (OD) are important retinal landmarks. However, the occurrence of lesions, drusen, and other retinal abnormalities during retinal degeneration severely complicates automatic landmark detection and segmentation. Here we propose HBA-U-Net: a U-Net backbone enriched with hierarchical bottleneck attention. The network consists of a novel bottleneck attention block that combines and refines self-attention, channel attention, and relative-position attention to highlight retinal abnormalities that may be important for fovea and OD segmentation in the degenerated retina. HBA-U-Net achieved state-of-the-art results on fovea detection across datasets and eye conditions (ADAM: Euclidean Distance (ED) of 25.4 pixels, REFUGE: 32.5 pixels, IDRiD: 32.1 pixels), on OD segmentation for AMD (ADAM: Dice Coefficient (DC) of 0.947), and on OD detection for DR (IDRiD: ED of 20.5 pixels). Our results suggest that HBA-U-Net may be well suited for landmark detection in the presence of a variety of retinal degenerative diseases. △ Less

Submitted 9 July, 2021; originally announced July 2021.

Journal ref: Ophthalmic Medical Image Analysis 2021

arXiv:2106.04043 [pdf, other]

Dilated Convolution based CSI Feedback Compression for Massive MIMO Systems

Authors: Shunpu Tang, Junjuan Xia, Lisheng Fan, Xianfu Lei, Wei Xu, Arumugam Nallanathan

Abstract: Although the frequency-division duplex (FDD) massive multiple-input multiple-output (MIMO) system can offer high spectral and energy efficiency, it requires to feedback the downlink channel state information (CSI) from users to the base station (BS), in order to fulfill the precoding design at the BS. However, the large dimension of CSI matrices in the massive MIMO system makes the CSI feedback ve… ▽ More Although the frequency-division duplex (FDD) massive multiple-input multiple-output (MIMO) system can offer high spectral and energy efficiency, it requires to feedback the downlink channel state information (CSI) from users to the base station (BS), in order to fulfill the precoding design at the BS. However, the large dimension of CSI matrices in the massive MIMO system makes the CSI feedback very challenging, and it is urgent to compress the feedback CSI. To this end, this paper proposes a novel dilated convolution based CSI feedback network, namely DCRNet. Specifically, the dilated convolutions are used to enhance the receptive field (RF) of the proposed DCRNet without increasing the convolution size. Moreover, advanced encoder and decoder blocks are designed to improve the reconstruction performance and reduce computational complexity as well. Numerical results are presented to show the superiority of the proposed DCRNet over the conventional networks. In particular, the proposed DCRNet can achieve almost the state-of-the-arts (SOTA) performance with much lower floating point operations (FLOPs). The open source code and checkpoint of this work are available at https://github.com/recusant7/DCRNet. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2104.08336 [pdf, other]

Self-Supervised Graph Neural Networks for Improved Electroencephalographic Seizure Analysis

Authors: Siyi Tang, Jared A. Dunnmon, Khaled Saab, Xuan Zhang, Qianying Huang, Florian Dubost, Daniel L. Rubin, Christopher Lee-Messer

Abstract: Automated seizure detection and classification from electroencephalography (EEG) can greatly improve seizure diagnosis and treatment. However, several modeling challenges remain unaddressed in prior automated seizure detection and classification studies: (1) representing non-Euclidean data structure in EEGs, (2) accurately classifying rare seizure types, and (3) lacking a quantitative interpretabi… ▽ More Automated seizure detection and classification from electroencephalography (EEG) can greatly improve seizure diagnosis and treatment. However, several modeling challenges remain unaddressed in prior automated seizure detection and classification studies: (1) representing non-Euclidean data structure in EEGs, (2) accurately classifying rare seizure types, and (3) lacking a quantitative interpretability approach to measure model ability to localize seizures. In this study, we address these challenges by (1) representing the spatiotemporal dependencies in EEGs using a graph neural network (GNN) and proposing two EEG graph structures that capture the electrode geometry or dynamic brain connectivity, (2) proposing a self-supervised pre-training method that predicts preprocessed signals for the next time period to further improve model performance, particularly on rare seizure types, and (3) proposing a quantitative model interpretability approach to assess a model's ability to localize seizures within EEGs. When evaluating our approach on seizure detection and classification on a large public dataset, we find that our GNN with self-supervised pre-training achieves 0.875 Area Under the Receiver Operating Characteristic Curve on seizure detection and 0.749 weighted F1-score on seizure classification, outperforming previous methods for both seizure detection and classification. Moreover, our self-supervised pre-training strategy significantly improves classification of rare seizure types. Furthermore, quantitative interpretability analysis shows that our GNN with self-supervised pre-training precisely localizes 25.4% focal seizures, a 21.9 point improvement over existing CNNs. Finally, by superimposing the identified seizure locations on both raw EEG signals and EEG graphs, our approach could provide clinicians with an intuitive visualization of localized seizure regions. △ Less

Submitted 13 March, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

Comments: Published as a conference paper at ICLR 2022

Journal ref: ICLR 2022

arXiv:2103.14929 [pdf, other]

ELM-based Frame Synchronization in Nonlinear Distortion Scenario Using Superimposed Training

Authors: Chaojin Qing, Wang Yu, Shuhai Tang, Chuangui Rao, Jiafan Wang

Abstract: The requirement of high spectrum efficiency puts forward higher requirements on frame synchronization (FS) in wireless communication systems. Meanwhile, a large number of nonlinear devices or blocks will inevitably cause nonlinear distortion. To avoid the occupation of bandwidth resources and overcome the difficulty of nonlinear distortion, an extreme learning machine (ELM)-based network is introd… ▽ More The requirement of high spectrum efficiency puts forward higher requirements on frame synchronization (FS) in wireless communication systems. Meanwhile, a large number of nonlinear devices or blocks will inevitably cause nonlinear distortion. To avoid the occupation of bandwidth resources and overcome the difficulty of nonlinear distortion, an extreme learning machine (ELM)-based network is introduced into the superimposed training-based FS with nonlinear distortion. Firstly, a preprocessing procedure is utilized to reap the features of synchronization metric (SM). Then, based on the rough features of SM, an ELM network is constructed to estimate the offset of frame boundary. The analysis and experiment results show that, compared with existing methods, the proposed method can improve the error probability of FS and bit error rate (BER) of symbol detection (SD). In addition, this improvement has its robustness against the impacts of parameter variations. △ Less

Submitted 27 March, 2021; originally announced March 2021.

Comments: 10 pages, 5 figures

arXiv:2102.01990 [pdf]

A Deep Learning-Based Approach to Extracting Periosteal and Endosteal Contours of Proximal Femur in Quantitative CT Images

Authors: Yu Deng, Ling Wang, Chen Zhao, Shaojie Tang, Xiaoguang Cheng, Hong-Wen Deng, Weihua Zhou

Abstract: Automatic CT segmentation of proximal femur is crucial for the diagnosis and risk stratification of orthopedic diseases; however, current methods for the femur CT segmentation mainly rely on manual interactive segmentation, which is time-consuming and has limitations in both accuracy and reproducibility. In this study, we proposed an approach based on deep learning for the automatic extraction of… ▽ More Automatic CT segmentation of proximal femur is crucial for the diagnosis and risk stratification of orthopedic diseases; however, current methods for the femur CT segmentation mainly rely on manual interactive segmentation, which is time-consuming and has limitations in both accuracy and reproducibility. In this study, we proposed an approach based on deep learning for the automatic extraction of the periosteal and endosteal contours of proximal femur in order to differentiate cortical and trabecular bone compartments. A three-dimensional (3D) end-to-end fully convolutional neural network, which can better combine the information between neighbor slices and get more accurate segmentation results, was developed for our segmentation task. 100 subjects aged from 50 to 87 years with 24,399 slices of proximal femur CT images were enrolled in this study. The separation of cortical and trabecular bone derived from the QCT software MIAF-Femur was used as the segmentation reference. We randomly divided the whole dataset into a training set with 85 subjects for 10-fold cross-validation and a test set with 15 subjects for evaluating the performance of models. Two models with the same network structures were trained and they achieved a dice similarity coefficient (DSC) of 97.87% and 96.49% for the periosteal and endosteal contours, respectively. To verify the excellent performance of our model for femoral segmentation, we measured the volume of different parts of the femur and compared it with the ground truth and the relative errors between predicted result and ground truth are all less than 5%. It demonstrated a strong potential for clinical use, including the hip fracture risk prediction and finite element analysis. △ Less

Submitted 7 February, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

arXiv:2011.06380 [pdf, other]

Automatic Neural Lyrics and Melody Composition

Authors: Gurunath Reddy Madhumani, Yi Yu, Florian Harscoët, Simon Canales, Suhua Tang

Abstract: In this paper, we propose a technique to address the most challenging aspect of algorithmic songwriting process, which enables the human community to discover original lyrics, and melodies suitable for the generated lyrics. The proposed songwriting system, Automatic Neural Lyrics and Melody Composition (AutoNLMC) is an attempt to make the whole process of songwriting automatic using artificial neu… ▽ More In this paper, we propose a technique to address the most challenging aspect of algorithmic songwriting process, which enables the human community to discover original lyrics, and melodies suitable for the generated lyrics. The proposed songwriting system, Automatic Neural Lyrics and Melody Composition (AutoNLMC) is an attempt to make the whole process of songwriting automatic using artificial neural networks. Our lyric to vector (lyric2vec) model trained on a large set of lyric-melody pairs dataset parsed at syllable, word and sentence levels are large scale embedding models enable us to train data driven model such as recurrent neural networks for popular English songs. AutoNLMC is a encoder-decoder sequential recurrent neural network model consisting of a lyric generator, a lyric encoder and melody decoder trained end-to-end. AutoNLMC is designed to generate both lyrics and corresponding melody automatically for an amateur or a person without music knowledge. It can also take lyrics from professional lyric writer to generate matching melodies. The qualitative and quantitative evaluation measures revealed that the proposed method is indeed capable of generating original lyrics and corresponding melody for composing new songs. △ Less

Submitted 12 November, 2020; originally announced November 2020.

Comments: 15 pages

arXiv:2010.12146 [pdf, other]

doi 10.1109/ACCESS.2021.3070901

Reliable Over-the-Air Computation by Amplify-and-Forward Based Relay

Authors: Suhua Tang, Huarui Yin, Chao Zhang, Sadao Obana

Abstract: In typical sensor networks, data collection and processing are separated. A sink collects data from all nodes sequentially, which is very time consuming. Over-the-air computation, as a new diagram of sensor networks, integrates data collection and processing in one slot: all nodes transmit their signals simultaneously in the analog wave and the processing is done in the air. This method, although… ▽ More In typical sensor networks, data collection and processing are separated. A sink collects data from all nodes sequentially, which is very time consuming. Over-the-air computation, as a new diagram of sensor networks, integrates data collection and processing in one slot: all nodes transmit their signals simultaneously in the analog wave and the processing is done in the air. This method, although efficient, requires that signals from all nodes arrive at the sink, aligned in signal magnitude so as to enable an unbiased estimation. For nodes far away from the sink with a low channel gain, misalignment in signal magnitude is unavoidable. To solve this problem, in this paper, we investigate the amplify-and-forward based relay, in which a relay node amplifies signals from many nodes at the same time. We first discuss the general relay model and a simple relay policy. Then, a coherent relay policy is proposed to reduce relay transmission power. Directly minimizing the computation error tends to over-increase node transmission power. Therefore, the two relay policies are further refined with a new metric, and the transmission power is reduced while the computation error is kept low. In addition, the coherent relay policy helps to reduce the relay transmission power by half, to below the limit, which makes it one step ahead towards practical applications. △ Less

Submitted 9 May, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

Journal ref: in IEEE Access, vol. 9, pp. 53333-53342, 2021

arXiv:2010.08006 [pdf]

doi 10.1038/s41598-021-87762-2

Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset

Authors: Siyi Tang, Amirata Ghorbani, Rikiya Yamashita, Sameer Rehman, Jared A. Dunnmon, James Zou, Daniel L. Rubin

Abstract: The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically ide… ▽ More The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically identify low quality data are highly desired. In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset. We characterized the effectiveness of data Shapley in identifying low quality versus valuable data for pneumonia detection. We found that removing training data with high Shapley values decreased the pneumonia detection performance, whereas removing data with low Shapley values improved the model performance. Furthermore, there were more mislabeled examples in low Shapley value data and more true pneumonia cases in high Shapley value data. Our results suggest that low Shapley value indicates mislabeled or poor quality images, whereas high Shapley value indicates data that are valuable for pneumonia detection. Our method can serve as a framework for using data Shapley to denoise large-scale medical imaging datasets. △ Less

Submitted 15 October, 2020; originally announced October 2020.

arXiv:2009.13477 [pdf]

doi 10.1088/1361-6560/abef45

Super-Resolution Ultrasound Localization Microscopy Based on a High Frame-rate Clinical Ultrasound Scanner: An In-human Feasibility Study

Authors: Chengwu Huang, Wei Zhang, Ping Gong, U-Wai Lok, Shanshan Tang, Tinghui Yin, Xirui Zhang, Lei Zhu, Maodong Sang, Pengfei Song, Rongqin Zheng, Shigao Chen

Abstract: Non-invasive detection of microvascular alterations in deep tissues in vivo provides critical information for clinical diagnosis and evaluation of a broad-spectrum of pathologies. Recently, the emergence of super-resolution ultrasound localization microscopy (ULM) offers new possibilities for clinical imaging of microvasculature at capillary level. Currently, the clinical utility of ULM on clinica… ▽ More Non-invasive detection of microvascular alterations in deep tissues in vivo provides critical information for clinical diagnosis and evaluation of a broad-spectrum of pathologies. Recently, the emergence of super-resolution ultrasound localization microscopy (ULM) offers new possibilities for clinical imaging of microvasculature at capillary level. Currently, the clinical utility of ULM on clinical ultrasound scanners is hindered by the technical limitations, such as long data acquisition time, and compromised tracking performance associated with low imaging frame-rate. Here we present an in-human ULM on a high frame-rate (HFR) clinical ultrasound scanner to achieve super-resolution microvessel imaging using a short acquisition time (<10s). Ultrasound MB data were acquired from different human tissues, (liver, kidney, pancreatic, and breast tumor) using an HFR clinical scanner. By leveraging the HFR and advanced processing techniques including sub-pixel motion registration, MB signal separation, and Kalman filter-based tracking, MBs can be robustly localized and tracked for successful ULM under the circumstances of relatively high MB concentration and limited data acquisition time in humans. Subtle morphological and hemodynamic information were demonstrated on data acquired with single breath-hold and free-hand scanning. Compared with contrast-enhanced power Doppler generated based on the same MB dataset, ULM showed a 5.7-fold resolution improvement in a vessel, and provided a wide-range flow speed measurement that is Doppler angle-independent. This study demonstrated the feasibility of ultrafast in-human ULM in various human tissues based on a clinical scanner that supports HFR imaging, and showed a great potential for the implementation of super-resolution ultrasound microvessel imaging in a myriad of clinical applications involving microvascular abnormalities and pathologies. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: 41 pages, 5 figures, 4 supplemental figures

arXiv:2009.11975 [pdf, other]

CoFF: Cooperative Spatial Feature Fusion for 3D Object Detection on Autonomous Vehicles

Authors: Jingda Guo, Dominic Carrillo, Sihai Tang, Qi Chen, Qing Yang, Song Fu, Xi Wang, Nannan Wang, Paparao Palacharla

Abstract: To reduce the amount of transmitted data, feature map based fusion is recently proposed as a practical solution to cooperative 3D object detection by autonomous vehicles. The precision of object detection, however, may require significant improvement, especially for objects that are far away or occluded. To address this critical issue for the safety of autonomous vehicles and human beings, we prop… ▽ More To reduce the amount of transmitted data, feature map based fusion is recently proposed as a practical solution to cooperative 3D object detection by autonomous vehicles. The precision of object detection, however, may require significant improvement, especially for objects that are far away or occluded. To address this critical issue for the safety of autonomous vehicles and human beings, we propose a cooperative spatial feature fusion (CoFF) method for autonomous vehicles to effectively fuse feature maps for achieving a higher 3D object detection performance. Specially, CoFF differentiates weights among feature maps for a more guided fusion, based on how much new semantic information is provided by the received feature maps. It also enhances the inconspicuous features corresponding to far/occluded objects to improve their detection precision. Experimental results show that CoFF achieves a significant improvement in terms of both detection precision and effective detection range for autonomous vehicles, compared to previous feature fusion solutions. △ Less

Submitted 24 September, 2020; originally announced September 2020.

arXiv:2008.11111 [pdf, other]

Dynamics of feed forward induced interference training

Authors: Shirui Tang

Abstract: Preceptron model updating with back propagation has become the routine of deep learning. Continuous feed forward procedure is required in order for backward propagate to function properly. Doubting the underlying physical interpretation on transformer based models such as GPT brought about by the routine explaination, a new method of training is proposed in order to keep self-consistency of the ph… ▽ More Preceptron model updating with back propagation has become the routine of deep learning. Continuous feed forward procedure is required in order for backward propagate to function properly. Doubting the underlying physical interpretation on transformer based models such as GPT brought about by the routine explaination, a new method of training is proposed in order to keep self-consistency of the physics. By treating the GPT model as a space-time diagram, and then trace the worldlines of signals, identifing the possible paths of signals in order fot a self-attention event to occure. With a slight modification, self-attention can be viewed as an ising model interaction, which enables the goal to be designed as energy of system. Target is treated as an external magnetic field inducing signals modeled as magnetic dipoles. A probability network is designed to pilot input signals travelling for different durations through different routes. A rule of updating the probabilities is designed in order to form constructive interference at target locations so that instantaneous energy can be maximised. Experiment was conducted on a 4-class classification problem extracted from MNIST. The results exhibit interesting but expected behavours, which do not exist in a bp updated network, but more like learning in a real human, especially in the few-shot scenario. △ Less

Submitted 16 October, 2020; v1 submitted 24 August, 2020; originally announced August 2020.

arXiv:2007.02091 [pdf]

doi 10.1016/j.ijleo.2021.167551

Semantic Segmentation Using Deep Learning to Extract Total Extraocular Muscles and Optic Nerve from Orbital Computed Tomography Images

Authors: Fubao Zhu, Zhengyuan Gao, Chen Zhao, Zelin Zhu, Yanyun Liu, Shaojie Tang, Chengzhi Jiang, Xinhui Li, Min Zhao, Weihua Zhou

Abstract: Objectives: Precise segmentation of total extraocular muscles (EOM) and optic nerve (ON) is essential to assess anatomical development and progression of thyroid-associated ophthalmopathy (TAO). We aim to develop a semantic segmentation method based on deep learning to extract the total EOM and ON from orbital CT images in patients with suspected TAO. Materials and Methods: A total of 7,879 images… ▽ More Objectives: Precise segmentation of total extraocular muscles (EOM) and optic nerve (ON) is essential to assess anatomical development and progression of thyroid-associated ophthalmopathy (TAO). We aim to develop a semantic segmentation method based on deep learning to extract the total EOM and ON from orbital CT images in patients with suspected TAO. Materials and Methods: A total of 7,879 images obtained from 97 subjects who underwent orbit CT scans due to suspected TAO were enrolled in this study. Eighty-eight patients were randomly selected into the training/validation dataset, and the rest were put into the test dataset. Contours of the total EOM and ON in all the patients were manually delineated by experienced radiologists as the ground truth. A three-dimensional (3D) end-to-end fully convolutional neural network called semantic V-net (SV-net) was developed for our segmentation task. Intersection over Union (IoU) was measured to evaluate the accuracy of the segmentation results, and Pearson correlation analysis was used to evaluate the volumes measured from our segmentation results against those from the ground truth. Results: Our model in the test dataset achieved an overall IoU of 0.8207; the IoU was 0.7599 for the superior rectus muscle, 0.8183 for the lateral rectus muscle, 0.8481 for the medial rectus muscle, 0.8436 for the inferior rectus muscle and 0.8337 for the optic nerve. The volumes measured from our segmentation results agreed well with those from the ground truth (all R>0.98, P<0.0001). Conclusion: The qualitative and quantitative evaluations demonstrate excellent performance of our method in automatically extracting the total EOM and ON and measuring their volumes in orbital CT images. There is a great promise for clinical application to assess these anatomical structures for the diagnosis and prognosis of TAO. △ Less

Submitted 4 July, 2020; originally announced July 2020.

Comments: 17 pages, 8 figures

arXiv:2006.11655 [pdf]

A Novel Method for ECG Signal Classification via One-Dimensional Convolutional Neural Network

Authors: Xuan Hua, Jungang Han, Chen Zhao, Haipeng Tang, Zhuo He, Jinshan Tang, Qing-Hui Chen, Shaojie Tang, Weihua Zhou

Abstract: This paper presents an end-to-end ECG signal classification method based on a novel segmentation strategy via 1D Convolutional Neural Networks (CNN) to aid the classification of ECG signals. The ECG segmentation strategy named R-R-R strategy (i.e., retaining ECG data between the R peaks just before and after the current R peak) for segmenting the original ECG data into segments in order to train a… ▽ More This paper presents an end-to-end ECG signal classification method based on a novel segmentation strategy via 1D Convolutional Neural Networks (CNN) to aid the classification of ECG signals. The ECG segmentation strategy named R-R-R strategy (i.e., retaining ECG data between the R peaks just before and after the current R peak) for segmenting the original ECG data into segments in order to train and test the 1D CNN models. The novel strategy mimics physicians in scanning ECG to a greater extent, and maximizes the inherent information of ECG segments. The performance of the classification models for 5-class and 6-class are verified with ECG signals from 48 records of the MIT-BIH arrhythmia database. As the heartbeat types are divided into 5 classes (i.e., normal beat, left bundle branch block beat, right bundle branch block beat, ventricular ectopic beat, and paced beat) in the MIT-BIH, the best classification accuracy, the area under the curve (AUC), the sensitivity and the F1-score reach 99.24%, 0.9994, 0.99 and 0.99, respectively. As the heartbeat types are divided into 6 classes (i.e., normal beat, left bundle branch block beat, right bundle branch block beat, ventricular ectopic beat, paced beat and other beats) in the MIT-BIH, the beat classification accuracy, the AUC, the sensitivity, and the F1-score reach 97.02%, 0.9966, 0.97, and 0.97, respectively. Meanwhile, according to the recommended practice from the Association for Advancement of Medical Instrumentation (AAMI), the heartbeat types are divided into 5 classes (i.e., normal beat, supraventricular ectopic beats, ventricular ectopic beats, fusion beats, and unclassifiable beats), the beat classification accuracy, the sensitivity, and the F1-score reach 97.45%, 0.97, and 0.97, respectively. The experimental results show that the proposed method achieves better performance than the state-of-the-art methods. △ Less

Submitted 20 June, 2020; originally announced June 2020.

arXiv:2006.02666 [pdf]

Deep Sequential Feature Learning in Clinical Image Classification of Infectious Keratitis

Authors: Yesheng Xu, Ming Kong, Wenjia Xie, Runping Duan, Zhengqing Fang, Yuxiao Lin, Qiang Zhu, Siliang Tang, Fei Wu, Yu-Feng Yao

Abstract: Infectious keratitis is the most common entities of corneal diseases, in which pathogen grows in the cornea leading to inflammation and destruction of the corneal tissues. Infectious keratitis is a medical emergency, for which a rapid and accurate diagnosis is needed for speedy initiation of prompt and precise treatment to halt the disease progress and to limit the extent of corneal damage; otherw… ▽ More Infectious keratitis is the most common entities of corneal diseases, in which pathogen grows in the cornea leading to inflammation and destruction of the corneal tissues. Infectious keratitis is a medical emergency, for which a rapid and accurate diagnosis is needed for speedy initiation of prompt and precise treatment to halt the disease progress and to limit the extent of corneal damage; otherwise it may develop sight-threatening and even eye-globe-threatening condition. In this paper, we propose a sequential-level deep learning model to effectively discriminate the distinction and subtlety of infectious corneal disease via the classification of clinical images. In this approach, we devise an appropriate mechanism to preserve the spatial structures of clinical images and disentangle the informative features for clinical image classification of infectious keratitis. In competition with 421 ophthalmologists, the performance of the proposed sequential-level deep model achieved 80.00% diagnostic accuracy, far better than the 49.27% diagnostic accuracy achieved by ophthalmologists over 120 test images. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Comments: Accepted by Engineering

arXiv:2005.13534 [pdf, ps, other]

Robot-assisted Backscatter Localization for IoT Applications

Authors: Shengkai Zhang, Wei Wang, Sheyang Tang, Shi Jin, Tao Jiang

Abstract: Recent years have witnessed the rapid proliferation of backscatter technologies that realize the ubiquitous and long-term connectivity to empower smart cities and smart homes. Localizing such backscatter tags is crucial for IoT-based smart applications. However, current backscatter localization systems require prior knowledge of the site, either a map or landmarks with known positions, which is la… ▽ More Recent years have witnessed the rapid proliferation of backscatter technologies that realize the ubiquitous and long-term connectivity to empower smart cities and smart homes. Localizing such backscatter tags is crucial for IoT-based smart applications. However, current backscatter localization systems require prior knowledge of the site, either a map or landmarks with known positions, which is laborious for deployment. To empower universal localization service, this paper presents Rover, an indoor localization system that localizes multiple backscatter tags without any start-up cost using a robot equipped with inertial sensors. Rover runs in a joint optimization framework, fusing measurements from backscattered WiFi signals and inertial sensors to simultaneously estimate the locations of both the robot and the connected tags. Our design addresses practical issues including interference among multiple tags, real-time processing, as well as the data marginalization problem in dealing with degenerated motions. We prototype Rover using off-the-shelf WiFi chips and customized backscatter tags. Our experiments show that Rover achieves localization accuracies of 39.3 cm for the robot and 74.6 cm for the tags. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: To appear in IEEE Transactions on Wireless Communications. arXiv admin note: substantial text overlap with arXiv:1908.03297

Showing 1–50 of 64 results for author: Tang, S