-
Leveraging Self-Supervised Learning for MIMO-OFDM Channel Representation and Generation
Authors:
Zongxi Liu,
Jiacheng Chen,
Yunting Xu,
Ting Ma,
Jingbo Liu,
Haibo Zhou,
Dusit Niyato
Abstract:
In communications theory, the capacity of multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems is fundamentally determined by wireless channels, which exhibit both diversity and correlation in spatial, frequency and temporal domains. It is further envisioned to exploit the inherent nature of channels, namely representation, to achieve geolocation-based MIMO…
▽ More
In communications theory, the capacity of multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems is fundamentally determined by wireless channels, which exhibit both diversity and correlation in spatial, frequency and temporal domains. It is further envisioned to exploit the inherent nature of channels, namely representation, to achieve geolocation-based MIMO transmission for 6G, exemplified by the fully-decoupled radio access network (FD-RAN). Accordingly, this paper first employs self-supervised learning to obtain channel representation from unlabeled channel, then proposes a channel generation assisted approach for determining MIMO precoding matrix solely based on geolocation. Specifically, we exploit the small-scale temporal domain variations of channels at a fixed geolocation, and design an ingenious pretext task tailored for contrastive learning. Then, a Transformer-based encoder is trained to output channel representations. We further develop a conditional diffusion generator to generate channel representations from geolocation. Finally, a Transformer-encoder-based decoder is utilized to reconstruct channels from generated representations, where the optimal channel is selected for calculating the precoding matrix for both single and dual BS transmission. We conduct experiments on a public ray-tracing channel dataset, and the extensive simulation results demonstrate the effectiveness of our channel representation method, and also showcase the performance improvement in geolocation-based MIMO transmission.
△ Less
Submitted 23 May, 2024;
originally announced July 2024.
-
The USTC-NERCSLIP Systems for The ICMC-ASR Challenge
Authors:
Minghui Wu,
Luzhen Xu,
Jie Zhang,
Haitao Tang,
Yanyan Yue,
Ruizhi Liao,
Jintao Zhao,
Zhengzhe Zhang,
Yichi Wang,
Haoyin Yan,
Hongliang Yu,
Tongle Ma,
Jiachen Liu,
Chongliang Wu,
Yongchao Li,
Yanyong Zhang,
Xin Fang,
Yue Zhang
Abstract:
This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position,…
▽ More
This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position, respectively. For ASR, we employ an iterative pseudo-label generation method based on fusion model to obtain text labels of unsupervised data. To mitigate the impact of accent, an Accent-ASR framework is proposed, which captures pronunciation-related accent features at a fine-grained level and linguistic information at a coarse-grained level. On the ICMC-ASR eval set, the proposed system achieves a CER of 13.16% on track 1 and a cpCER of 21.48% on track 2, which significantly outperforms the official baseline system and obtains the first rank on both tracks.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Centerline Boundary Dice Loss for Vascular Segmentation
Authors:
Pengcheng Shi,
Jiesi Hu,
Yanwu Yang,
Zilve Gao,
Wei Liu,
Ting Ma
Abstract:
Vascular segmentation in medical imaging plays a crucial role in analysing morphological and functional assessments. Traditional methods, like the centerline Dice (clDice) loss, ensure topology preservation but falter in capturing geometric details, especially under translation and deformation. The combination of clDice with traditional Dice loss can lead to diameter imbalance, favoring larger ves…
▽ More
Vascular segmentation in medical imaging plays a crucial role in analysing morphological and functional assessments. Traditional methods, like the centerline Dice (clDice) loss, ensure topology preservation but falter in capturing geometric details, especially under translation and deformation. The combination of clDice with traditional Dice loss can lead to diameter imbalance, favoring larger vessels. Addressing these challenges, we introduce the centerline boundary Dice (cbDice) loss function, which harmonizes topological integrity and geometric nuances, ensuring consistent segmentation across various vessel sizes. cbDice enriches the clDice approach by including boundary-aware aspects, thereby improving geometric detail recognition. It matches the performance of the boundary difference over union (B-DoU) loss through a mask-distance-based approach, enhancing traslation sensitivity. Crucially, cbDice incorporates radius information from vascular skeletons, enabling uniform adaptation to vascular diameter changes and maintaining balance in branch growth and fracture impacts. Furthermore, we conducted a theoretical analysis of clDice variants (cl-X-Dice). We validated cbDice's efficacy on three diverse vascular segmentation datasets, encompassing both 2D and 3D, and binary and multi-class segmentation. Particularly, the method integrated with cbDice demonstrated outstanding performance on the MICCAI 2023 TopCoW Challenge dataset. Our code is made publicly available at: https://github.com/PengchengShi1220/cbDice.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
PAPR Reduction with Pre-chirp Selection for Affine Frequency Division Multiplexing
Authors:
Haozhi Yuan,
Yin Xu,
Xinghao Guo,
Yao Ge,
Tianyao Ma,
Haoyang Li,
Dazhi He,
Wenjun Zhang
Abstract:
Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique for high-mobility communications based on discrete affine Fourier transform (DAFT). By properly tuning the pre-chirp parameter and the post-chirp parameter in the DAFT, the effective channel in the DAFT domain can completely circumvent path overlap, thereby constituting a full representation of delay-Doppler pr…
▽ More
Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique for high-mobility communications based on discrete affine Fourier transform (DAFT). By properly tuning the pre-chirp parameter and the post-chirp parameter in the DAFT, the effective channel in the DAFT domain can completely circumvent path overlap, thereby constituting a full representation of delay-Doppler profile. However, AFDM has a crucial problem of high peak-to-average power ratio (PAPR), stemming from randomness of modulated symbols. In this letter, a novel algorithm named grouped pre-chirp selection (GPS) is proposed to reduce PAPR by strategically varying the pre-chirp parameter across subcarrier groups. Initially, it is established that key AFDM properties are maintained when implementing GPS. Next, we proceed to detail the operational procedures of the GPS algorithm, elucidating its principle for PAPR reduction and emphasizing its computational efficiency advantages. Finally, simulation results employing the complementary cumulative distribution function (CCDF) validate the effectiveness of the proposed GPS in reducing PAPR.
△ Less
Submitted 25 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Authors:
Saierdaer Yusuyin,
Te Ma,
Hao Huang,
Wenbo Zhao,
Zhijian Ou
Abstract:
There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. Th…
▽ More
There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. This paper explores the approach of pre-training with weakly phonetic supervision towards data-efficient MCL-ASR, which is called Whistle. We relax the requirement of gold-standard human-validated phonetic transcripts, and obtain International Phonetic Alphabet (IPA) based transcription by leveraging the LanguageNet grapheme-to-phoneme (G2P) models. We construct a common experimental setup based on the CommonVoice dataset, called CV-Lang10, with 10 seen languages and 2 unseen languages. A set of experiments are conducted on CV-Lang10 to compare, as fair as possible, the three approaches under the common setup for MCL-ASR. Experiments demonstrate the advantages of phoneme-based models (Whistle) for MCL-ASR, in terms of speech recognition for seen languages, crosslingual performance for unseen languages with different amounts of few-shot data, overcoming catastrophic forgetting, and training efficiency.It is found that when training data is more limited, phoneme supervision can achieve better results compared to subword supervision and self-supervision, thereby providing higher data-efficiency. To support reproducibility and promote future research along this direction, we will release the code, models and data for the whole pipeline of Whistle at https://github.com/thu-spmi/CAT upon publication.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Multi-Beam Integrated Sensing and Communication: State-of-the-Art, Challenges and Opportunities
Authors:
Yinxiao Zhuo,
Tianqi Mao,
Haojin Li,
Chen Sun,
Zhaocheng Wang,
Zhu Han,
Sheng Chen
Abstract:
Integrated sensing and communication (ISAC) has been envisioned as a critical enabling technology for the next-generation wireless communication, which can realize location/motion detection of surroundings with communication devices. This additional sensing capability leads to a substantial network quality gain and expansion of the service scenarios. As the system evolves to millimeter wave (mmWav…
▽ More
Integrated sensing and communication (ISAC) has been envisioned as a critical enabling technology for the next-generation wireless communication, which can realize location/motion detection of surroundings with communication devices. This additional sensing capability leads to a substantial network quality gain and expansion of the service scenarios. As the system evolves to millimeter wave (mmWave) and above, ISAC can realize simultaneous communications and sensing of the ultra-high throughput level and radar resolution with compact design, which relies on directional beamforming against the path loss. With the multi-beam technology, the dual functions of ISAC can be seamlessly incorporated at the beamspace level by unleashing the potential of joint beamforming. To this end, this article investigates the key technologies for multi-beam ISAC system. We begin with an overview of the current state-of-the-art solutions in multi-beam ISAC. Subsequently, a detailed analysis of the advantages associated with the multi-beam ISAC is provided. Additionally, the key technologies for transmitter, channel and receiver of the multi-beam ISAC are introduced. Finally, we explore the challenges and opportunities presented by multi-beam ISAC, offering valuable insights into this emerging field.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
DAFT-Spread Affine Frequency Division Multiple Access for Downlink Transmission
Authors:
Yiwei Tao,
Miaowen Wen,
Yao Ge,
Tianqi Mao,
Lixia Xiao,
Jun Li
Abstract:
Affine frequency division multiplexing (AFDM) and orthogonal AFDM access (O-AFDMA) are promising techniques based on chirp signals, which are able to suppress the performance deterioration caused by Doppler shifts in high-mobility scenarios. However, the high peak-to-average power ratio (PAPR) in AFDM or O-AFDMA is still a crucial problem, which severely limits their practical applications. In thi…
▽ More
Affine frequency division multiplexing (AFDM) and orthogonal AFDM access (O-AFDMA) are promising techniques based on chirp signals, which are able to suppress the performance deterioration caused by Doppler shifts in high-mobility scenarios. However, the high peak-to-average power ratio (PAPR) in AFDM or O-AFDMA is still a crucial problem, which severely limits their practical applications. In this paper, we propose a discrete affine Fourier transform (DAFT)-spread AFDMA scheme based on the properties of the AFDM systems, named DAFT-s-AFDMA to significantly reduce the PAPR by resorting to the DAFT. We formulate the transmitted time-domain signals of the proposed DAFT-s-AFDMA schemes with localized and interleaved chirp subcarrier allocation strategies. Accordingly, we derive the guidelines for setting the DAFT parameters, revealing the insights of PAPR reduction. Finally, simulation results of PAPR comparison in terms of the complementary cumulative distribution function (CCDF) show that the proposed DAFT-s-AFDMA schemes with localized and interleaved strategies can both attain better PAPR performances than the conventional O-AFDMA scheme.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Channel Estimation for AFDM With Superimposed Pilots
Authors:
Kai Zheng,
Miaowen Wen,
Tianqi Mao,
Lixia Xiao,
Zhaocheng Wang
Abstract:
The recent proposed affine frequency division multiplexing (AFDM) employing a multi-chirp waveform has shown its reliability and robustness in doubly selective fading channels. In the existing embedded pilot-aided channel estimation methods, the presence of guard symbols in the discrete affine Fourier transform (DAFT) domain causes inevitable degradation of the spectral efficiency (SE). To improve…
▽ More
The recent proposed affine frequency division multiplexing (AFDM) employing a multi-chirp waveform has shown its reliability and robustness in doubly selective fading channels. In the existing embedded pilot-aided channel estimation methods, the presence of guard symbols in the discrete affine Fourier transform (DAFT) domain causes inevitable degradation of the spectral efficiency (SE). To improve the SE, we propose a novel AFDM channel estimation scheme by introducing the superimposed pilots in the DAFT domain. An effective pilot placement method that minimizes the channel estimation error is also developed with a rigorous proof. To mitigate the pilot-data interference, we further propose an iterative channel estimator and signal detector. Simulation results demonstrate that both channel estimation and data detection performances can be improved by the proposed scheme as the number of superimposed pilots increases.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Intelligent Reflecting Surface-Enabled Anti-Detection for Secure Sensing and Communications
Authors:
Beixiong Zheng,
Xue Xiong,
Tiantian Ma,
Jie Tang,
Derrick Wing Kwan Ng,
A. Lee Swindlehurst,
Rui Zhang
Abstract:
The ever-increasing reliance on wireless communication and sensing has led to growing concerns over the vulnerability of sensitive information to unauthorized detection and interception. Traditional anti-detection methods are often inadequate, suffering from limited adaptability and diminished effectiveness against advanced detection technologies. To overcome these challenges, this article present…
▽ More
The ever-increasing reliance on wireless communication and sensing has led to growing concerns over the vulnerability of sensitive information to unauthorized detection and interception. Traditional anti-detection methods are often inadequate, suffering from limited adaptability and diminished effectiveness against advanced detection technologies. To overcome these challenges, this article presents the intelligent reflecting surface (IRS) as a groundbreaking technology for enabling flexible electromagnetic manipulation, which has the potential to revolutionize anti-detection in both electromagnetic stealth/spoofing (evading radar detection) and covert communications (facilitating secure information exchange). We explore the fundamental principles of IRS and its advantages over traditional anti-detection techniques and discuss various design challenges associated with implementing IRS-based anti-detection systems. Through the examination of case studies and future research directions, we provide a comprehensive overview of the potential of IRS technology to serve as a formidable shield in the modern wireless landscape.
△ Less
Submitted 21 April, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
Sensing-Resistance-Oriented Beamforming for Privacy Protection from ISAC Devices
Authors:
Teng Ma,
Yue Xiao,
Xia Lei,
Ming Xiao
Abstract:
With the evolution of integrated sensing and communication (ISAC) technology, a growing number of devices go beyond conventional communication functions with sensing abilities. Therefore, future networks are divinable to encounter new privacy concerns on sensing, such as the exposure of position information to unintended receivers. In contrast to traditional privacy preserving schemes aiming to pr…
▽ More
With the evolution of integrated sensing and communication (ISAC) technology, a growing number of devices go beyond conventional communication functions with sensing abilities. Therefore, future networks are divinable to encounter new privacy concerns on sensing, such as the exposure of position information to unintended receivers. In contrast to traditional privacy preserving schemes aiming to prevent eavesdropping, this contribution conceives a novel beamforming design toward sensing resistance (SR). Specifically, we expect to guarantee the communication quality while masking the real direction of the SR transmitter during the communication. To evaluate the SR performance, a metric termed angular-domain peak-to-average ratio (ADPAR) is first defined and analyzed. Then, we resort to the null-space technique to conceal the real direction, hence to convert the optimization problem to a more tractable form. Moreover, semidefinite relaxation along with index optimization is further utilized to obtain the optimal beamformer. Finally, simulation results demonstrate the feasibility of the proposed SR-oriented beamforming design toward privacy protection from ISAC receivers.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Pre-Chirp-Domain Index Modulation for Affine Frequency Division Multiplexing
Authors:
Guangyao Liu,
Tianqi Mao,
Ruiqi Liu,
Zhenyu Xiao
Abstract:
Affine frequency division multiplexing (AFDM), tailored as a novel multicarrier technique utilizing chirp signals for high-mobility communications, exhibits marked advantages compared to traditional orthogonal frequency division multiplexing (OFDM). AFDM is based on the discrete affine Fourier transform (DAFT) with two modifiable parameters of the chirp signals, termed as the pre-chirp parameter a…
▽ More
Affine frequency division multiplexing (AFDM), tailored as a novel multicarrier technique utilizing chirp signals for high-mobility communications, exhibits marked advantages compared to traditional orthogonal frequency division multiplexing (OFDM). AFDM is based on the discrete affine Fourier transform (DAFT) with two modifiable parameters of the chirp signals, termed as the pre-chirp parameter and post-chirp parameter, respectively. These parameters can be fine-tuned to avoid overlapping channel paths with different delays or Doppler shifts, leading to performance enhancement especially for doubly dispersive channel. In this paper, we propose a novel AFDM structure with the pre-chirp index modulation (PIM) philosophy (AFDM-PIM), which can embed additional information bits into the pre-chirp parameter design for both spectral and energy efficiency enhancement. Specifically, we first demonstrate that the application of distinct pre-chirp parameters to various subcarriers in the AFDM modulation process maintains the orthogonality among these subcarriers. Then, different pre-chirp parameters are flexibly assigned to each AFDM subcarrier according to the incoming bits. By such arrangement, aside from classical phase/amplitude modulation, extra binary bits can be implicitly conveyed by the indices of selected pre-chirping parameters realizations without additional energy consumption. At the receiver, both a maximum likelihood (ML) detector and a reduced-complexity ML-minimum mean square error (ML-MMSE) detector are employed to recover the information bits. It has been shown via simulations that the proposed AFDM-PIM exhibits superior bit error rate (BER) performance compared to classical AFDM, OFDM and IM-aided OFDM algorithms.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Real-Time Asphalt Pavement Layer Thickness Prediction Using Ground-Penetrating Radar Based on a Modified Extended Common Mid-Point (XCMP) Approach
Authors:
Siqi Wang,
Zhen Leng,
Xin Sui,
Weiguang Zhang,
Tao Ma,
Zehui Zhu
Abstract:
The conventional surface reflection method has been widely used to measure the asphalt pavement layer dielectric constant using ground-penetrating radar (GPR). This method may be inaccurate for in-service pavement thickness estimation with dielectric constant variation through the depth, which could be addressed using the extended common mid-point method (XCMP) with air-coupled GPR antennas. Howev…
▽ More
The conventional surface reflection method has been widely used to measure the asphalt pavement layer dielectric constant using ground-penetrating radar (GPR). This method may be inaccurate for in-service pavement thickness estimation with dielectric constant variation through the depth, which could be addressed using the extended common mid-point method (XCMP) with air-coupled GPR antennas. However, the factors affecting the XCMP method on thickness prediction accuracy haven't been studied. Manual acquisition of key factors is required, which hinders its real-time applications. This study investigates the affecting factors and develops a modified XCMP method to allow automatic thickness prediction of in-service asphalt pavement with non-uniform dielectric properties through depth. A sensitivity analysis was performed, necessitating the accurate estimation of time of flights (TOFs) from antenna pairs. A modified XCMP method based on edge detection was proposed to allow real-time TOFs estimation, then dielectric constant and thickness predictions. Field tests using a multi-channel GPR system were performed for validation. Both the surface reflection and XCMP setups were conducted. Results show that the modified XCMP method is recommended with a mean prediction error of 1.86%, which is more accurate than the surface reflection method (5.73%).
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle
Authors:
Hongshan Liu,
Tong Qin,
Zhen Gao,
Tianqi Mao,
Keke Ying,
Ziwei Wan,
Li Qiao,
Rui Na,
Zhongxiang Li,
Chun Hu,
Yikun Mei,
Tuan Li,
Guanghui Wen,
Lei Chen,
Zhonghuai Wu,
Ruiqi Liu,
Gaojie Chen,
Shuo Wang,
Dezhi Zheng
Abstract:
This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis…
▽ More
This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis between the NS-COM network and other counterparts in SAGSIN is conducted, covering aspects of deployment, coverage, channel characteristics and unique problems of NS-COM network. Afterwards, the technical aspects of NS-COM, including channel modeling, random access, channel estimation, array-based beam management and joint network optimization, are examined in detail. Furthermore, we explore the potential applications of NS-COM, such as structural expansion in SAGSIN communication, civil aviation communication, remote and urgent communication, weather monitoring and carbon neutrality. Finally, some promising research avenues are identified, including stratospheric satellite (StratoSat) -to-ground direct links for mobile terminals, reconfigurable multiple-input multiple-output (MIMO) and holographic MIMO, federated learning in NS-COM networks, maritime communication, electromagnetic spectrum sensing and adversarial game, integrated sensing and communications, StratoSat-based radar detection and imaging, NS-COM assisted enhanced global navigation system, NS-COM assisted intelligent unmanned system and free space optical (FSO) communication. Overall, this paper highlights that the NS-COM plays an indispensable role in the SAGSIN puzzle, providing substantial performance and coverage enhancement to the traditional SAGSIN architecture.
△ Less
Submitted 4 March, 2024; v1 submitted 30 December, 2023;
originally announced January 2024.
-
Near-Field Sparse Channel Estimation for Extremely Large-Scale RIS-Aided Wireless Communications
Authors:
Zixing Tang,
Yuanbin Chen,
Ying Wang,
Tianqi Mao,
Qingqing Wu,
Marco Di Renzo,
Lajos Hanzo
Abstract:
A significant increase in the number of reconfigurable intelligent surface (RIS) elements results in a spherical wavefront in the near field of extremely large-scale RIS (XL-RIS). Although the channel matrix of the cascaded two-hop link may become sparse in the polar-domain representation, their accurate estimation of these polar-domain parameters cannot be readily guaranteed. To tackle this chall…
▽ More
A significant increase in the number of reconfigurable intelligent surface (RIS) elements results in a spherical wavefront in the near field of extremely large-scale RIS (XL-RIS). Although the channel matrix of the cascaded two-hop link may become sparse in the polar-domain representation, their accurate estimation of these polar-domain parameters cannot be readily guaranteed. To tackle this challenge, we exploit the sparsity inherent in the cascaded channel. To elaborate, we first estimate the significant path-angles and distances corresponding to the common paths between the BS and the XL-RIS. Then, the individual path parameters associated with different users are recovered. This results in a two-stage channel estimation scheme, in which distinct learning-based networks are used for channel training at each stage. More explicitly, in stage I, a denoising convolutional neural network (DnCNN) is employed for treating the grid mismatches as noise to determine the true grid index of the angles and distances. By contrast, an iterative shrinkage thresholding algorithm (ISTA) based network is proposed for adaptively adjusting the column coherence of the dictionary matrix in stage II. Finally, our simulation results demonstrate that the proposed two-stage learning-based channel estimation outperforms the state-of-the-art benchmarks.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Cross-Domain Dual-Functional OFDM Waveform Design for Accurate Sensing/Positioning
Authors:
Fan Zhang,
Tianqi Mao,
Ruiqi Liu,
Zhu Han,
Sheng Chen,
Zhaocheng Wang
Abstract:
Orthogonal frequency division multiplexing (OFDM) has been widely recognized as the representative waveform for 5G wireless networks, which can directly support sensing/positioning with existing infrastructure. To guarantee superior sensing/positioning accuracy while supporting high-speed communications simultaneously, the dual functions tend to be assigned with different resource elements (REs) d…
▽ More
Orthogonal frequency division multiplexing (OFDM) has been widely recognized as the representative waveform for 5G wireless networks, which can directly support sensing/positioning with existing infrastructure. To guarantee superior sensing/positioning accuracy while supporting high-speed communications simultaneously, the dual functions tend to be assigned with different resource elements (REs) due to their diverse design requirements. This motivates optimization of resource allocation/waveform design across time, frequency, power and delay-Doppler domains. Therefore, this article proposes two cross-domain waveform optimization strategies for effective convergence of OFDM-based communications and sensing/positioning, following communication- and sensing-centric criteria, respectively. For the communication-centric design, to maximize the achievable data rate, a fraction of REs are optimally allocated for communications according to prior knowledge of the communication channel. The remaining REs are then employed for sensing/positioning, where the sidelobe level and peak-to-average power ratio are suppressed by optimizing its power-frequency and phase-frequency characteristics for sensing performance improvement. For the sensing-centric design, a `locally' perfect auto-correlation property is ensured for accurate sensing and positioning by adjusting the unit cells of the ambiguity function within its region of interest (RoI). Afterwards, the irrelevant cells beyond RoI, which can readily determine the sensing power allocation, are optimized with the communication power allocation to enhance the achievable data rate. Numerical results demonstrate the superiority of the proposed waveform designs.
△ Less
Submitted 19 March, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
AGMDT: Virtual Staining of Renal Histology Images with Adjacency-Guided Multi-Domain Transfer
Authors:
Tao Ma,
Chao Zhang,
Min Lu,
Lin Luo
Abstract:
Renal pathology, as the gold standard of kidney disease diagnosis, requires doctors to analyze a series of tissue slices stained by H&E staining and special staining like Masson, PASM, and PAS, respectively. These special staining methods are costly, time-consuming, and hard to standardize for wide use especially in primary hospitals. Advances of supervised learning methods have enabled the virtua…
▽ More
Renal pathology, as the gold standard of kidney disease diagnosis, requires doctors to analyze a series of tissue slices stained by H&E staining and special staining like Masson, PASM, and PAS, respectively. These special staining methods are costly, time-consuming, and hard to standardize for wide use especially in primary hospitals. Advances of supervised learning methods have enabled the virtually conversion of H&E images into special staining images, but achieving pixel-to-pixel alignment for training remains challenging. In contrast, unsupervised learning methods regarding different stains as different style transfer domains can utilize unpaired data, but they ignore the spatial inter-domain correlations and thus decrease the trustworthiness of structural details for diagnosis. In this paper, we propose a novel virtual staining framework AGMDT to translate images into other domains by avoiding pixel-level alignment and meanwhile utilizing the correlations among adjacent tissue slices. We first build a high-quality multi-domain renal histological dataset where each specimen case comprises a series of slices stained in various ways. Based on it, the proposed framework AGMDT discovers patch-level aligned pairs across the serial slices of multi-domains through glomerulus detection and bipartite graph matching, and utilizes such correlations to supervise the end-to-end model for multi-domain staining transformation. Experimental results show that the proposed AGMDT achieves a good balance between the precise pixel-level alignment and unpaired domain transfer by exploiting correlations across multi-domain serial pathological slices, and outperforms the state-of-the-art methods in both quantitative measure and morphological details.
△ Less
Submitted 17 September, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion
Authors:
Zhe Ye,
Terui Mao,
Li Dong,
Diqun Yan
Abstract:
Deep speech classification has achieved tremendous success and greatly promoted the emergence of many real-world applications. However, backdoor attacks present a new security threat to it, particularly with untrustworthy third-party platforms, as pre-defined triggers set by the attacker can activate the backdoor. Most of the triggers in existing speech backdoor attacks are sample-agnostic, and ev…
▽ More
Deep speech classification has achieved tremendous success and greatly promoted the emergence of many real-world applications. However, backdoor attacks present a new security threat to it, particularly with untrustworthy third-party platforms, as pre-defined triggers set by the attacker can activate the backdoor. Most of the triggers in existing speech backdoor attacks are sample-agnostic, and even if the triggers are designed to be unnoticeable, they can still be audible. This work explores a backdoor attack that utilizes sample-specific triggers based on voice conversion. Specifically, we adopt a pre-trained voice conversion model to generate the trigger, ensuring that the poisoned samples does not introduce any additional audible noise. Extensive experiments on two speech classification tasks demonstrate the effectiveness of our attack. Furthermore, we analyzed the specific scenarios that activated the proposed backdoor and verified its resistance against fine-tuning.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
NexToU: Efficient Topology-Aware U-Net for Medical Image Segmentation
Authors:
Pengcheng Shi,
Xutao Guo,
Yanwu Yang,
Chenfei Ye,
Ting Ma
Abstract:
Convolutional neural networks (CNN) and Transformer variants have emerged as the leading medical image segmentation backbones. Nonetheless, due to their limitations in either preserving global image context or efficiently processing irregular shapes in visual objects, these backbones struggle to effectively integrate information from diverse anatomical regions and reduce inter-individual variabili…
▽ More
Convolutional neural networks (CNN) and Transformer variants have emerged as the leading medical image segmentation backbones. Nonetheless, due to their limitations in either preserving global image context or efficiently processing irregular shapes in visual objects, these backbones struggle to effectively integrate information from diverse anatomical regions and reduce inter-individual variability, particularly for the vasculature. Motivated by the successful breakthroughs of graph neural networks (GNN) in capturing topological properties and non-Euclidean relationships across various fields, we propose NexToU, a novel hybrid architecture for medical image segmentation. NexToU comprises improved Pool GNN and Swin GNN modules from Vision GNN (ViG) for learning both global and local topological representations while minimizing computational costs. To address the containment and exclusion relationships among various anatomical structures, we reformulate the topological interaction (TI) module based on the nature of binary trees, rapidly encoding the topological constraints into NexToU. Extensive experiments conducted on three datasets (including distinct imaging dimensions, disease types, and imaging modalities) demonstrate that our method consistently outperforms other state-of-the-art (SOTA) architectures. All the code is publicly available at https://github.com/PengchengShi1220/NexToU.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
6G Enabled Advanced Transportation Systems
Authors:
Ruiqi Liu,
Meng Hua,
Ke Guan,
Xiping Wang,
Leyi Zhang,
Tianqi Mao,
Di Zhang,
Qingqing Wu,
Abbas Jamalipour
Abstract:
With the emergence of communication services with stringent requirements such as autonomous driving or on-flight Internet, the sixth-generation (6G) wireless network is envisaged to become an enabling technology for future transportation systems. In this paper, two ways of interactions between 6G networks and transportation are extensively investigated. On one hand, the new usage scenarios and cap…
▽ More
With the emergence of communication services with stringent requirements such as autonomous driving or on-flight Internet, the sixth-generation (6G) wireless network is envisaged to become an enabling technology for future transportation systems. In this paper, two ways of interactions between 6G networks and transportation are extensively investigated. On one hand, the new usage scenarios and capabilities of 6G over existing cellular networks are firstly highlighted. Then, its potential in seamless and ubiquitous connectivity across the heterogeneous space-air-ground transportation systems is demonstrated, where railways, airplanes, high-altitude platforms and satellites are investigated. On the other hand, we reveal that the introduction of 6G guarantees a more intelligent, efficient and secure transportation system. Specifically, technical analysis on how 6G can empower future transportation is provided, based on the latest research and standardization progresses in localization, integrated sensing and communications, and security. The technical challenges and insights for a road ahead are also summarized for possible inspirations on 6G enabled advanced transportation.
△ Less
Submitted 11 December, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Doppler-Resilient Design of CAZAC Sequences for mmWave/THz Sensing Applications
Authors:
Fan Zhang,
Tianqi Mao,
Zhaocheng Wang
Abstract:
Ultra-high-resolution target sensing has emerged as a key enabler for various cutting-edge applications, which can be realized by utilizing the millimeter wave/terahertz frequencies. However, the extremely high operating frequency inevitably leads to significant Doppler shift effects, especially for high-mobility applications, causing the degradation of sensing performance with high false alarm ra…
▽ More
Ultra-high-resolution target sensing has emerged as a key enabler for various cutting-edge applications, which can be realized by utilizing the millimeter wave/terahertz frequencies. However, the extremely high operating frequency inevitably leads to significant Doppler shift effects, especially for high-mobility applications, causing the degradation of sensing performance with high false alarm rate. To this end, this paper proposes a parameter design methodology of the well-known constant amplitude zero auto correlation (CAZAC) sequences, which aims at enhancing their resilience to Doppler shifts. Specifically, we suppress the sidelobes incurred by Doppler shifts for the peak-to-sidelobe ratio (PSLR) improvement within the range of interest (RoI) of the radar range profile. The Zadoff-Chu (ZC) sequence, as a representative member in the CAZAC family, is firstly considered. The impacts of its root index on range sidelobes are investigated based on number theory. For an arbitrary-length ZC sequence, a feasible range of the root index is derived to satisfy the requirement of PSLR within the scope of RoI. Furthermore, these design guidelines are extended to a general form of CAZAC sequences, where a low-complexity heuristic algorithm is developed for PSLR improvement. Simulation results demonstrate that under severe Doppler shifts, our proposed methodology could enhance the sensing performance by lowering the false alarm rate while maintaining the same detection rate, compared with its classical counterpart.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
DECOR-NET: A COVID-19 Lung Infection Segmentation Network Improved by Emphasizing Low-level Features and Decorrelating Features
Authors:
Jiesi Hu,
Yanwu Yang,
Xutao Guo,
Ting Ma
Abstract:
Since 2019, coronavirus Disease 2019 (COVID-19) has been widely spread and posed a serious threat to public health. Chest Computed Tomography (CT) holds great potential for screening and diagnosis of this disease. The segmentation of COVID-19 CT imaging can achieves quantitative evaluation of infections and tracks disease progression. COVID-19 infections are characterized by high heterogeneity and…
▽ More
Since 2019, coronavirus Disease 2019 (COVID-19) has been widely spread and posed a serious threat to public health. Chest Computed Tomography (CT) holds great potential for screening and diagnosis of this disease. The segmentation of COVID-19 CT imaging can achieves quantitative evaluation of infections and tracks disease progression. COVID-19 infections are characterized by high heterogeneity and unclear boundaries, so capturing low-level features such as texture and intensity is critical for segmentation. However, segmentation networks that emphasize low-level features are still lacking. In this work, we propose a DECOR-Net capable of capturing more decorrelated low-level features. The channel re-weighting strategy is applied to obtain plenty of low-level features and the dependencies between channels are reduced by proposed decorrelation loss. Experiments show that DECOR-Net outperforms other cutting-edge methods and surpasses the baseline by 5.1% and 4.9% in terms of Dice coefficient and intersection over union. Moreover, the proposed decorrelation loss can improve the performance constantly under different settings. The Code is available at https://github.com/jiesihu/DECOR-Net.git.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing
Authors:
Bingchuan Li,
Tianxiang Ma,
Peng Zhang,
Miao Hua,
Wei Liu,
Qian He,
Zili Yi
Abstract:
The StyleGAN family succeed in high-fidelity image generation and allow for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space.However, projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. Existing encoder-based or optimization-based StyleGAN inversion methods attempt to mitiga…
▽ More
The StyleGAN family succeed in high-fidelity image generation and allow for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space.However, projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. Existing encoder-based or optimization-based StyleGAN inversion methods attempt to mitigate the trade-off but see limited performance. To fundamentally resolve this problem, we propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively, instead of balancing the two. Specifically, in Phase I, a W-space-oriented StyleGAN inversion network is trained and used to perform image inversion and editing, which assures the editability but sacrifices reconstruction quality. In Phase II, a carefully designed rectifying network is utilized to rectify the inversion errors and perform ideal reconstruction. Experimental results show that our approach yields near-perfect reconstructions without sacrificing the editability, thus allowing accurate manipulation of real images. Further, we evaluate the performance of our rectifying network, and see great generalizability towards unseen manipulation types and out-of-domain images.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Accelerating Diffusion Models via Pre-segmentation Diffusion Sampling for Medical Image Segmentation
Authors:
Xutao Guo,
Yanwu Yang,
Chenfei Ye,
Shang Lu,
Yang Xiang,
Ting Ma
Abstract:
Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian n…
▽ More
Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian noise, resulting in extremely inefficient inference. To mitigate the issue, we propose a principled acceleration strategy, called pre-segmentation diffusion sampling DDPM (PD-DDPM), which is specially used for medical image segmentation. The key idea is to obtain pre-segmentation results based on a separately trained segmentation network, and construct noise predictions (non-Gaussian distribution) according to the forward diffusion rule. We can then start with noisy predictions and use fewer reverse steps to generate segmentation results. Experiments show that PD-DDPM yields better segmentation results over representative baseline methods even if the number of reverse steps is significantly reduced. Moreover, PD-DDPM is orthogonal to existing advanced segmentation models, which can be combined to further improve the segmentation performance.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Multi-modal Dynamic Graph Network: Coupling Structural and Functional Connectome for Disease Diagnosis and Classification
Authors:
Yanwu Yang,
Xutao Guo,
Zhikai Chang,
Chenfei Ye,
Yang Xiang,
Ting Ma
Abstract:
Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of gre…
▽ More
Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of great importance in modeling brain connectome networks and relating disease-specific patterns. However, most existing graph methods explicitly require known graph structures, which are not available in the sophisticated brain system. Especially in heterogeneous multi-modal brain networks, there exists a great challenge to model interactions among brain regions in consideration of inter-modal dependencies. In this study, we propose a Multi-modal Dynamic Graph Convolution Network (MDGCN) for structural and functional brain network learning. Our method benefits from modeling inter-modal representations and relating attentive multi-model associations into dynamic graphs with a compositional correspondence matrix. Moreover, a bilateral graph convolution layer is proposed to aggregate multi-modal representations in terms of multi-modal associations. Extensive experiments on three datasets demonstrate the superiority of our proposed method in terms of disease classification, with the accuracy of 90.4%, 85.9% and 98.3% in predicting Mild Cognitive Impairment (MCI), Parkinson's disease (PD), and schizophrenia (SCHZ) respectively. Furthermore, our statistical evaluations on the correspondence matrix exhibit a high correspondence with previous evidence of biomarkers.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
AIM 2022 Challenge on Instagram Filter Removal: Methods and Results
Authors:
Furkan Kınlı,
Sami Menteş,
Barış Özcan,
Furkan Kıraç,
Radu Timofte,
Yi Zuo,
Zitao Wang,
Xiaowen Zhang,
Yu Zhu,
Chenghua Li,
Cong Leng,
Jian Cheng,
Shuai Liu,
Chaoyu Feng,
Furui Bai,
Xiaotao Wang,
Lei Lei,
Tianzhi Ma,
Zihan Gao,
Wenxin He,
Woon-Ha Yeo,
Wang-Taek Oh,
Young-Il Kim,
Han-Cheol Ryu,
Gang He
, et al. (8 additional authors not shown)
Abstract:
This paper introduces the methods and the results of AIM 2022 challenge on Instagram Filter Removal. Social media filters transform the images by consecutive non-linear operations, and the feature maps of the original content may be interpolated into a different domain. This reduces the overall performance of the recent deep learning strategies. The main goal of this challenge is to produce realis…
▽ More
This paper introduces the methods and the results of AIM 2022 challenge on Instagram Filter Removal. Social media filters transform the images by consecutive non-linear operations, and the feature maps of the original content may be interpolated into a different domain. This reduces the overall performance of the recent deep learning strategies. The main goal of this challenge is to produce realistic and visually plausible images where the impact of the filters applied is mitigated while preserving the content. The proposed solutions are ranked in terms of the PSNR value with respect to the original images. There are two prior studies on this task as the baseline, and a total of 9 teams have competed in the final phase of the challenge. The comparison of qualitative results of the proposed solutions and the benchmark for the challenge are presented in this report.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Estimating Brain Age with Global and Local Dependencies
Authors:
Yanwu Yang,
Xutao Guo,
Zhikai Chang,
Chenfei Ye,
Yang Xiang,
Haiyan Lv,
Ting Ma
Abstract:
The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such a…
▽ More
The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such as local convolution and recurrent operations that process one local neighborhood at a time. Instead, Vision Transformers learn global attentive interaction of patch tokens, introducing less inductive bias and modeling long-range dependencies. In terms of this, we proposed a novel network for learning brain age interpreting with global and local dependencies, where the corresponding representations are captured by Successive Permuted Transformer (SPT) and convolution blocks. The SPT brings computation efficiency and locates the 3D spatial information indirectly via continuously encoding 2D slices from different views. Finally, we collect a large cohort of 22645 subjects with ages ranging from 14 to 97 and our network performed the best among a series of deep learning methods, yielding a mean absolute error (MAE) of 2.855 in validation set, and 2.911 in an independent test set.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Terahertz-Band Near-Space Communications: From a Physical-Layer Perspective
Authors:
Tianqi Mao,
Leyi Zhang,
Zhenyu Xiao,
Zhu Han,
Xiang-Gen Xia
Abstract:
Facilitated by rapid technological development of the near-space platform stations (NSPS), near-space communication (NS-COM) is envisioned to play a pivotal role in the space-air-ground integrated network for sixth-generation (6G) communications and beyond. In NS-COM, ultra-broadband wireless connectivity between NSPSs and various airborne/spaceborne platforms is required for a plethora of bandwid…
▽ More
Facilitated by rapid technological development of the near-space platform stations (NSPS), near-space communication (NS-COM) is envisioned to play a pivotal role in the space-air-ground integrated network for sixth-generation (6G) communications and beyond. In NS-COM, ultra-broadband wireless connectivity between NSPSs and various airborne/spaceborne platforms is required for a plethora of bandwidth-consuming applications, such as NSPS-based Ad hoc networking, in-flight Internet and relaying technology. However, such requirement seems to contradict with the scarcity of spectrum resources at conventional microwave frequencies, which motivates the exploitation of terahertz (THz) band ranging from 0.1 to 10 THz. Due to huge available bandwidth, the THz signals are capable of supporting ultra-high-rate data transmission for NS-COM over 100 Gb/s, which are naturally suitable for the near-space environment with marginal path loss. To this end, this article provides an extensive investigation on the THz-band NS-COM (THz-NS-COM) from a physical-layer perspective. Firstly, we summarize the potential applications of THz communications in the near-space environment, where the corresponding technical barriers are analyzed. Then the channel characteristics of THz-NS-COM and the corresponding modeling strategies are discussed, respectively. Afterwards, three essential research directions are investigated to surpass the technical barriers of THz-NS-COM, i.e., robust beamforming for ultra-massive antenna array, signal processing algorithms against hybrid distortions, and integrated sensing and communications. Several open problems are also provided to unleash the full potential of THz-NS-COM.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
LEO Satellite Access Network (LEO-SAN) Towards 6G: Challenges and Approaches
Authors:
Zhenyu Xiao,
Junyi Yang,
Tianqi Mao,
Chong Xu,
Rui Zhang,
Zhu Han,
Xiang-Gen Xia
Abstract:
With the rapid development of satellite communication technologies, the space-based access network has been envisioned as a promising complementary part of the future 6G network. Aside from terrestrial base stations, satellite nodes, especially the low-earth-orbit (LEO) satellites, can also serve as base stations for Internet access, and constitute the LEO-satellite-based access network (LEO-SAN).…
▽ More
With the rapid development of satellite communication technologies, the space-based access network has been envisioned as a promising complementary part of the future 6G network. Aside from terrestrial base stations, satellite nodes, especially the low-earth-orbit (LEO) satellites, can also serve as base stations for Internet access, and constitute the LEO-satellite-based access network (LEO-SAN). LEO-SAN is expected to provide seamless massive access and extended coverage with high signal quality. However, its practical implementation still faces significant technical challenges, e.g., high mobility and limited budget for communication payloads of LEO satellite nodes. This paper aims at revealing the main technical issues that have not been fully addressed by the existing LEO-SAN designs, from three aspects namely random access, beam management and Doppler-resistant transmission technologies. More specifically, the critical issues of random access in LEO-SAN are discussed regarding low flexibility, long transmission delay, and inefficient handshakes. Then the beam management for LEO-SAN is investigated in complex propagation environments under the constraints of high mobility and limited payload budget. Furthermore, the influence of Doppler shifts on LEO-SAN is explored. Correspondingly, promising technologies to address these challenges are also discussed, respectively. Finally, the future research directions are envisioned.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
Near Space Communications (NS-COM): A New Regime in Space-Air-Ground Integrated Network (SAGIN)
Authors:
Zhenyu Xiao,
Tianqi Mao,
Zhu Han,
Xiang-Gen Xia
Abstract:
Precipitated by the technological innovations of the near-space platform stations (NSPS), the near space communication (NS-COM) network has emerged as an indispensable part of the next-generation space-air-ground integrated network (SAGIN) that facilitates ubiquitous coverage and broadband data transfer. This paper aims to provide a comprehensive overview of NS-COM. Firstly, we investigate the dif…
▽ More
Precipitated by the technological innovations of the near-space platform stations (NSPS), the near space communication (NS-COM) network has emerged as an indispensable part of the next-generation space-air-ground integrated network (SAGIN) that facilitates ubiquitous coverage and broadband data transfer. This paper aims to provide a comprehensive overview of NS-COM. Firstly, we investigate the differences between NS-COM and the existing terrestrial cellular networks as well as satellite-based and unmanned-aerial-vehicle (UAV)-based communication networks, which is followed by a review of the NS-COM development. Then, we explore the unique characteristics of NS-COM regarding the platforms and the propagation environment of the near space. The main issues of NS-COM are identified, resulted from the extremely long transmission distance, limitations of the communication payloads on NSPS and complex atmospheric constitution of the near space. Then various application scenarios of NS-COM are discussed, where the special technical requirements are also revealed, from the physical-layer aspect like transceiver design to the upper-layer aspect like computational offloading and NSPS placement. Furthermore, we investigate the co-existence of NS-COM and ground networks by treating each other as interferers or collaborators. Finally, we list several potential technologies for NS-COM from the perspective of spectrum usage, and highlight their technical challenges for future research.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
Ultrathin, high-speed, all-optical photoacoustic endomicroscopy probe for guiding minimally invasive surgery
Authors:
Tianrui Zhao,
Truc Thuy Pham,
Christian Baker,
Michelle T. Ma,
Sebastien Ourselin,
Tom Vercauteren,
Edward Zhang,
Paul C. Beard,
Wenfeng Xia
Abstract:
Photoacoustic (PA) endoscopy has shown significant potential for clinical diagnosis and surgical guidance. Multimode fibres (MMFs) are becoming increasing attractive for the development of miniature endoscopy probes owing to ultrathin size, low cost and diffraction-limited spatial resolution enabled by wavefront shaping. However, current MMF-based PA endomicroscopy probes are either limited by a b…
▽ More
Photoacoustic (PA) endoscopy has shown significant potential for clinical diagnosis and surgical guidance. Multimode fibres (MMFs) are becoming increasing attractive for the development of miniature endoscopy probes owing to ultrathin size, low cost and diffraction-limited spatial resolution enabled by wavefront shaping. However, current MMF-based PA endomicroscopy probes are either limited by a bulky ultrasound detector or a low imaging speed which hindered their usability. In this work, we report the development of a highly miniaturised and high-speed PA endomicroscopy probe that is integrated within the cannula of a 20 gauge medical needle. This probe comprises a MMF for delivering the PA excitation light and a single-mode optical fibre with a plano-concave microresonator for ultrasound detection. Wavefront shaping with a digital micromirror device enabled rapid raster-scanning of a focused light spot at the distal end of the MMF for tissue interrogation. High-resolution PA imaging of mouse red blood cells covering an area 100 microns in diameter was achieved with the needle probe at ~3 frames per second. Mosaicing imaging was performed after fibre characterisation by translating the needle probe to enlarge the field-of-view in real-time. The developed ultrathin PA endomicroscopy probe is promising for guiding minimally invasive surgery by providing functional, molecular and microstructural information of tissue in real-time.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Label conditioned segmentation
Authors:
Tianyu Ma,
Benjamin C. Lee,
Mert R. Sabuncu
Abstract:
Semantic segmentation is an important task in computer vision that is often tackled with convolutional neural networks (CNNs). A CNN learns to produce pixel-level predictions through training on pairs of images and their corresponding ground-truth segmentation labels. For segmentation tasks with multiple classes, the standard approach is to use a network that computes a multi-channel probabilistic…
▽ More
Semantic segmentation is an important task in computer vision that is often tackled with convolutional neural networks (CNNs). A CNN learns to produce pixel-level predictions through training on pairs of images and their corresponding ground-truth segmentation labels. For segmentation tasks with multiple classes, the standard approach is to use a network that computes a multi-channel probabilistic segmentation map, with each channel representing one class. In applications where the image grid size (e.g., when it is a 3D volume) and/or the number of labels is relatively large, the standard (baseline) approach can become prohibitively expensive for our computational resources. In this paper, we propose a simple yet effective method to address this challenge. In our approach, the segmentation network produces a single-channel output, while being conditioned on a single class label, which determines the output class of the network. Our method, called label conditioned segmentation (LCS), can be used to segment images with a very large number of classes, which might be infeasible for the baseline approach. We also demonstrate in the experiments that label conditioning can improve the accuracy of a given backbone architecture, likely, thanks to its parameter efficiency. Finally, as we show in our results, an LCS model can produce previously unseen fine-grained labels during inference time, when only coarse labels were available during training. We provide all of our code here: https://github.com/tym002/Label-conditioned-segmentation
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Hyper-Convolutions via Implicit Kernels for Medical Imaging
Authors:
Tianyu Ma,
Alan Q. Wang,
Adrian V. Dalca,
Mert R. Sabuncu
Abstract:
The convolutional neural network (CNN) is one of the most commonly used architectures for computer vision tasks. The key building block of a CNN is the convolutional kernel that aggregates information from the pixel neighborhood and shares weights across all pixels. A standard CNN's capacity, and thus its performance, is directly related to the number of learnable kernel weights, which is determin…
▽ More
The convolutional neural network (CNN) is one of the most commonly used architectures for computer vision tasks. The key building block of a CNN is the convolutional kernel that aggregates information from the pixel neighborhood and shares weights across all pixels. A standard CNN's capacity, and thus its performance, is directly related to the number of learnable kernel weights, which is determined by the number of channels and the kernel size (support). In this paper, we present the \textit{hyper-convolution}, a novel building block that implicitly encodes the convolutional kernel using spatial coordinates. Hyper-convolutions decouple kernel size from the total number of learnable parameters, enabling a more flexible architecture design. We demonstrate in our experiments that replacing regular convolutions with hyper-convolutions can improve performance with less parameters, and increase robustness against noise. We provide our code here: \emph{https://github.com/tym002/Hyper-Convolution}
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
Survey of charging scheduling, fleet management, and location planning of charging stations for electrified demand-responsive transport systems: methodologies and recent developments
Authors:
Tai-Yu Ma,
Yumeng Fang
Abstract:
The accelerated electrification of transport systems with EVs has brought new challenges for charging scheduling, fleet management, and charging infrastructure location and configuration planning. In this review, we have provided a systematic review of the recent development in strategic, tactical, and operational decisions for demand responsive transport system planning using electric vehicles (E…
▽ More
The accelerated electrification of transport systems with EVs has brought new challenges for charging scheduling, fleet management, and charging infrastructure location and configuration planning. In this review, we have provided a systematic review of the recent development in strategic, tactical, and operational decisions for demand responsive transport system planning using electric vehicles (EV-DRT). We have summarized recent developments in mathematical modeling approaches and identified future research directions. A list of existing open-access datasets, numerical test instances, and software are provided for future research in EV-DRT and related problems.
△ Less
Submitted 8 December, 2021;
originally announced December 2021.
-
How will electric vehicles affect traffic congestion and energy consumption: an integrated modelling approach
Authors:
Artur Grigorev,
Tuo Mao,
Adam Berry,
Joachim Tan,
Loki Purushothaman,
Adriana-Simona Mihaita
Abstract:
This paper explores the impact of electric vehicles (EVs) on traffic congestion and energy consumption by proposing an integrated bi-level framework comprising of: a) a dynamic micro-scale traffic simulation suitable for modelling current and hypothetical traffic and charging demand scenarios and b) a queue model for capturing the impact of fast charging station use, informed by traffic flows, tra…
▽ More
This paper explores the impact of electric vehicles (EVs) on traffic congestion and energy consumption by proposing an integrated bi-level framework comprising of: a) a dynamic micro-scale traffic simulation suitable for modelling current and hypothetical traffic and charging demand scenarios and b) a queue model for capturing the impact of fast charging station use, informed by traffic flows, travel distances, availability of charging infrastructure and estimated vehicle battery state of charge. To the best of our knowledge, this paper represents the first integrated analysis of potential traffic congestion and energy infrastructure impacts linked to EV uptake, based on real traffic flows and the placement and design of existing fast-charging infrastructure. Results showcase that the integrated queue-energy-transport modelling framework can predict correctly the limitations of the EV infrastructure as well as the traffic congestion evolution. The modelling approach identifies concrete pain points to be addressed in both traffic and energy management and planning. The code for this project can be found at : https://github.com/Future-Mobility-Lab/EV-charging-impact
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Uncertainty Quantification in Medical Image Segmentation with Multi-decoder U-Net
Authors:
Yanwu Yang,
Xutao Guo,
Yiwei Pan,
Pengcheng Shi,
Haiyan Lv,
Ting Ma
Abstract:
Accurate medical image segmentation is crucial for diagnosis and analysis. However, the models without calibrated uncertainty estimates might lead to errors in downstream analysis and exhibit low levels of robustness. Estimating the uncertainty in the measurement is vital to making definite, informed conclusions. Especially, it is difficult to make accurate predictions on ambiguous areas and focus…
▽ More
Accurate medical image segmentation is crucial for diagnosis and analysis. However, the models without calibrated uncertainty estimates might lead to errors in downstream analysis and exhibit low levels of robustness. Estimating the uncertainty in the measurement is vital to making definite, informed conclusions. Especially, it is difficult to make accurate predictions on ambiguous areas and focus boundaries for both models and radiologists, even harder to reach a consensus with multiple annotations. In this work, the uncertainty under these areas is studied, which introduces significant information with anatomical structure and is as important as segmentation performance. We exploit the medical image segmentation uncertainty quantification by measuring segmentation performance with multiple annotations in a supervised learning manner and propose a U-Net based architecture with multiple decoders, where the image representation is encoded with the same encoder, and segmentation referring to each annotation is estimated with multiple decoders. Nevertheless, a cross-loss function is proposed for bridging the gap between different branches. The proposed architecture is trained in an end-to-end manner and able to improve predictive uncertainty estimates. The model achieves comparable performance with fewer parameters to the integrated training model that ranked the runner-up in the MICCAI-QUBIQ 2020 challenge.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations
Authors:
Yuping Luo,
Tengyu Ma
Abstract:
Training-time safety violations have been a major concern when we deploy reinforcement learning algorithms in the real world. This paper explores the possibility of safe RL algorithms with zero training-time safety violations in the challenging setting where we are only given a safe but trivial-reward initial policy without any prior knowledge of the dynamics model and additional offline data. We…
▽ More
Training-time safety violations have been a major concern when we deploy reinforcement learning algorithms in the real world. This paper explores the possibility of safe RL algorithms with zero training-time safety violations in the challenging setting where we are only given a safe but trivial-reward initial policy without any prior knowledge of the dynamics model and additional offline data. We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies. The barrier certificates, learned via adversarial training, ensure the policy's safety assuming calibrated learned dynamics model. We also add a regularization term to encourage larger certified regions to enable better exploration. Empirical simulations show that zero safety violations are already challenging for a suite of simple environments with only 2-4 dimensional state space, especially if high-reward policies have to visit regions near the safety boundary. Prior methods require hundreds of violations to achieve decent rewards on these tasks, whereas our proposed algorithms incur zero violations.
△ Less
Submitted 10 March, 2022; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Terahertz Wireless Communications with Flexible Index Modulation Aided Pilot Design
Authors:
Tianqi Mao,
Zhaocheng Wang
Abstract:
Terahertz (THz) wireless communication is envisioned as a promising technology, which is capable of providing ultra-high-rate transmission up to Terabit per second. However, some hardware imperfections, which are generally neglected in the existing literature concerning lower data rates and traditional operating frequencies, cannot be overlooked in the THz systems. Hardware imperfections usually c…
▽ More
Terahertz (THz) wireless communication is envisioned as a promising technology, which is capable of providing ultra-high-rate transmission up to Terabit per second. However, some hardware imperfections, which are generally neglected in the existing literature concerning lower data rates and traditional operating frequencies, cannot be overlooked in the THz systems. Hardware imperfections usually consist of phase noise, in-phase/quadrature imbalance, and nonlinearity of power amplifier. Due to the time-variant characteristic of phase noise, frequent pilot insertion is required, leading to decreased spectral efficiency. In this paper, to address this issue, a novel pilot design strategy is proposed based on index modulation (IM), where the positions of pilots are flexibly changed in the data frame, and additional information bits can be conveyed by indices of pilots. Furthermore, a turbo receiving algorithm is developed, which jointly performs the detection of pilot indices and channel estimation in an iterative manner. It is shown that the proposed turbo receiver works well even under the situation where the prior knowledge of channel state information is outdated. Analytical and simulation results validate that the proposed schemes achieve significant enhancement of bit-error rate performance and channel estimation accuracy, whilst attaining higher spectral efficiency in comparison with its classical counterpart.
△ Less
Submitted 22 June, 2021;
originally announced June 2021.
-
Federated Learning for Internet of Things: A Federated Learning Framework for On-device Anomaly Data Detection
Authors:
Tuo Zhang,
Chaoyang He,
Tianhao Ma,
Lei Gao,
Mark Ma,
Salman Avestimehr
Abstract:
Federated learning can be a promising solution for enabling IoT cybersecurity (i.e., anomaly detection in the IoT environment) while preserving data privacy and mitigating the high communication/storage overhead (e.g., high-frequency data from time-series sensors) of centralized over-the-cloud approaches. In this paper, to further push forward this direction with a comprehensive study in both algo…
▽ More
Federated learning can be a promising solution for enabling IoT cybersecurity (i.e., anomaly detection in the IoT environment) while preserving data privacy and mitigating the high communication/storage overhead (e.g., high-frequency data from time-series sensors) of centralized over-the-cloud approaches. In this paper, to further push forward this direction with a comprehensive study in both algorithm and system design, we build FedIoT platform that contains FedDetect algorithm for on-device anomaly data detection and a system design for realistic evaluation of federated learning on IoT devices. Furthermore, the proposed FedDetect learning framework improves the performance by utilizing a local adaptive optimizer (e.g., Adam) and a cross-round learning rate scheduler. In a network of realistic IoT devices (Raspberry PI), we evaluate FedIoT platform and FedDetect algorithm in both model and system performance. Our results demonstrate the efficacy of federated learning in detecting a wider range of attack types occurred at multiple devices. The system efficiency analysis indicates that both end-to-end training time and memory cost are affordable and promising for resource-constrained IoT devices. The source code is publicly available at https://github.com/FedML-AI/FedIoT.
△ Less
Submitted 18 October, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Waveform Design for Joint Sensing and Communications in Millimeter-Wave and Low Terahertz Bands
Authors:
Tianqi Mao,
Jiaxuan Chen,
Qi Wang,
Chong Han,
Zhaocheng Wang,
George K. Karagiannidis
Abstract:
The convergence of sensing and communication in the millimeter-wave (mmWave) and low terahertz (THz) bands has been envisioned as a promising technology, since it incorporates high-rate data transmission of hundreds of Gbps and mm-level radar sensing in a spectrum- and cost-efficient manner, by sharing both the frequency and hardware resources. However, the joint radar sensing and communication (J…
▽ More
The convergence of sensing and communication in the millimeter-wave (mmWave) and low terahertz (THz) bands has been envisioned as a promising technology, since it incorporates high-rate data transmission of hundreds of Gbps and mm-level radar sensing in a spectrum- and cost-efficient manner, by sharing both the frequency and hardware resources. However, the joint radar sensing and communication (JRC) system faces considerable challenges in the mmWave and low-THz scale, due to the peculiarities of the propagation channel and radio-frequency (RF) front ends. To this end, the waveform design for the JRC systems in mmWave and low-THz bands with ultra-broad bandwidth is investigated in this paper. Firstly, by considering the JRC design based on the co-existence concept, where both functions operate in a time-domain duplex (TDD) manner, a novel multi-subband quasi-perfect (MS-QP) sequence, composed of multiple perfect subsequences on different subbands, is proposed for target sensing, which achieves accurate target ranging and velocity estimation, whilst only requiring cost-efficient low-rate analog-to-digital converters (A/Ds) for sequence detection. Furthermore, the root index of each perfect subsequence is designed to eliminate the influence of strong Doppler shift on radar sensing. Finally, a data-embedded MS-QP (DE-MS-QP) waveform is constructed through time-domain extension of the MS-QP sequence, generating null frequency points on each subband for data transmission. Unlike the co-existence-based JRC system in TDD manner, the proposed DE-MS-QP waveform enables simultaneous interference-free sensing and communication, whilst inheriting all the merits from MS-QP sequences. Numerical results validate the superiority of the proposed waveforms regarding the communication and sensing performances, hardware cost as well as flexibility of the resource allocation between the dual functions.
△ Less
Submitted 26 December, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Hyper-Convolution Networks for Biomedical Image Segmentation
Authors:
Tianyu Ma,
Adrian V. Dalca,
Mert R. Sabuncu
Abstract:
The convolution operation is a central building block of neural network architectures widely used in computer vision. The size of the convolution kernels determines both the expressiveness of convolutional neural networks (CNN), as well as the number of learnable parameters. Increasing the network capacity to capture rich pixel relationships requires increasing the number of learnable parameters,…
▽ More
The convolution operation is a central building block of neural network architectures widely used in computer vision. The size of the convolution kernels determines both the expressiveness of convolutional neural networks (CNN), as well as the number of learnable parameters. Increasing the network capacity to capture rich pixel relationships requires increasing the number of learnable parameters, often leading to overfitting and/or lack of robustness. In this paper, we propose a powerful novel building block, the hyper-convolution, which implicitly represents the convolution kernel as a function of kernel coordinates. Hyper-convolutions enable decoupling the kernel size, and hence its receptive field, from the number of learnable parameters. In our experiments, focused on challenging biomedical image segmentation tasks, we demonstrate that replacing regular convolutions with hyper-convolutions leads to more efficient architectures that achieve improved accuracy. Our analysis also shows that learned hyper-convolutions are naturally regularized, which can offer better generalization performance. We believe that hyper-convolutions can be a powerful building block in future neural network architectures for computer vision tasks. We provide all of our code here: https://github.com/tym002/Hyper-Convolution
△ Less
Submitted 6 October, 2022; v1 submitted 21 May, 2021;
originally announced May 2021.
-
Towards Unbiased COVID-19 Lesion Localisation and Segmentation via Weakly Supervised Learning
Authors:
Yang Yang,
Jiancong Chen,
Ruixuan Wang,
Ting Ma,
Lingwei Wang,
Jie Chen,
Wei-Shi Zheng,
Tong Zhang
Abstract:
Despite tremendous efforts, it is very challenging to generate a robust model to assist in the accurate quantification assessment of COVID-19 on chest CT images. Due to the nature of blurred boundaries, the supervised segmentation methods usually suffer from annotation biases. To support unbiased lesion localisation and to minimise the labeling costs, we propose a data-driven framework supervised…
▽ More
Despite tremendous efforts, it is very challenging to generate a robust model to assist in the accurate quantification assessment of COVID-19 on chest CT images. Due to the nature of blurred boundaries, the supervised segmentation methods usually suffer from annotation biases. To support unbiased lesion localisation and to minimise the labeling costs, we propose a data-driven framework supervised by only image-level labels. The framework can explicitly separate potential lesions from original images, with the help of a generative adversarial network and a lesion-specific decoder. Experiments on two COVID-19 datasets demonstrate the effectiveness of the proposed framework and its superior performance to several existing methods.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Enriching Under-Represented Named-Entities To Improve Speech Recognition Performance
Authors:
Tingzhi Mao,
Yerbolat Khassanov,
Van Tung Pham,
Haihua Xu,
Hao Huang,
Aishan Wumaier,
Eng Siong Chng
Abstract:
Automatic speech recognition (ASR) for under-represented named-entity (UR-NE) is challenging due to such named-entities (NE) have insufficient instances and poor contextual coverage in the training data to learn reliable estimates and representations. In this paper, we propose approaches to enriching UR-NEs to improve speech recognition performance. Specifically, our first priority is to ensure th…
▽ More
Automatic speech recognition (ASR) for under-represented named-entity (UR-NE) is challenging due to such named-entities (NE) have insufficient instances and poor contextual coverage in the training data to learn reliable estimates and representations. In this paper, we propose approaches to enriching UR-NEs to improve speech recognition performance. Specifically, our first priority is to ensure those UR-NEs to appear in the word lattice if there is any. To this end, we make exemplar utterances for those UR-NEs according to their categories (e.g. location, person, organization, etc.), ending up with an improved language model (LM) that boosts the UR-NE occurrence in the word lattice. With more UR-NEs appearing in the lattice, we then boost the recognition performance through lattice rescoring methods. We first enrich the representations of UR-NEs in a pre-trained recurrent neural network LM (RNNLM) by borrowing the embedding representations of the rich-represented NEs (RR-NEs), yielding the lattices that statistically favor the UR-NEs. Finally, we directly boost the likelihood scores of the utterances containing UR-NEs and gain further performance improvement.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
The NTU-AISG Text-to-speech System for Blizzard Challenge 2020
Authors:
Haobo Zhang,
Tingzhi Mao,
Haihua Xu,
Hao Huang
Abstract:
We report our NTU-AISG Text-to-speech (TTS) entry systems for the Blizzard Challenge 2020 in this paper. There are two TTS tasks in this year's challenge, one is a Mandarin TTS task, the other is a Shanghai dialect TTS task. We have participated both. One of the main challenges is to build TTS systems with low-resource constraints, particularly for the case of Shanghai dialect, of which about thre…
▽ More
We report our NTU-AISG Text-to-speech (TTS) entry systems for the Blizzard Challenge 2020 in this paper. There are two TTS tasks in this year's challenge, one is a Mandarin TTS task, the other is a Shanghai dialect TTS task. We have participated both. One of the main challenges is to build TTS systems with low-resource constraints, particularly for the case of Shanghai dialect, of which about three hours data are available to participants. To overcome the constraint, we adopt an average-speaker modeling method. That is, we first employ external Mandarin data to train both End-to-end acoustic model and WaveNet vocoder, then we use Shanghai dialect to tune the acoustic model and WaveNet vocoder respectively. Apart from this, we have no Shanghai dialect lexicon despite syllable transcripts are provided for the training data. Since we are not sure if similar syllable transcripts are provided for the evaluation data during the training stage, we use Mandarin lexicon for Shanghai dialect instead. With the letter, as decomposed from the corresponding Mandarin syllable, as input, though the naturalness and original speaker similarity of the synthesized speech are good, subjective evaluation results indicate the intelligibility of the synthesized speech is deeply undermined for the Shanghai dialect TTS system.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Ensembling Low Precision Models for Binary Biomedical Image Segmentation
Authors:
Tianyu Ma,
Hang Zhang,
Hanley Ong,
Amar Vora,
Thanh D. Nguyen,
Ajay Gupta,
Yi Wang,
Mert Sabuncu
Abstract:
Segmentation of anatomical regions of interest such as vessels or small lesions in medical images is still a difficult problem that is often tackled with manual input by an expert. One of the major challenges for this task is that the appearance of foreground (positive) regions can be similar to background (negative) regions. As a result, many automatic segmentation algorithms tend to exhibit asym…
▽ More
Segmentation of anatomical regions of interest such as vessels or small lesions in medical images is still a difficult problem that is often tackled with manual input by an expert. One of the major challenges for this task is that the appearance of foreground (positive) regions can be similar to background (negative) regions. As a result, many automatic segmentation algorithms tend to exhibit asymmetric errors, typically producing more false positives than false negatives. In this paper, we aim to leverage this asymmetry and train a diverse ensemble of models with very high recall, while sacrificing their precision. Our core idea is straightforward: A diverse ensemble of low precision and high recall models are likely to make different false positive errors (classifying background as foreground in different parts of the image), but the true positives will tend to be consistent. Thus, in aggregate the false positive errors will cancel out, yielding high performance for the ensemble. Our strategy is general and can be applied with any segmentation model. In three different applications (carotid artery segmentation in a neck CT angiography, myocardium segmentation in a cardiovascular MRI and multiple sclerosis lesion segmentation in a brain MRI), we show how the proposed approach can significantly boost the performance of a baseline segmentation method.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Physics-Informed Gaussian Process Regression for Probabilistic States Estimation and Forecasting in Power Grids
Authors:
Tong Ma,
David Alonso Barajas-Solano,
Ramakrishna Tipireddy,
Alexandre M. Tartakovsky
Abstract:
Real-time state estimation and forecasting is critical for efficient operation of power grids. In this paper, a physics-informed Gaussian process regression (PhI-GPR) method is presented and used for probabilistic forecasting and estimating the phase angle, angular speed, and wind mechanical power of a three-generator power grid system using sparse measurements. In standard data-driven Gaussian pr…
▽ More
Real-time state estimation and forecasting is critical for efficient operation of power grids. In this paper, a physics-informed Gaussian process regression (PhI-GPR) method is presented and used for probabilistic forecasting and estimating the phase angle, angular speed, and wind mechanical power of a three-generator power grid system using sparse measurements. In standard data-driven Gaussian process regression (GPR), parameterized models for the prior statistics are fit by maximizing the marginal likelihood of observed data, whereas in PhI-GPR, we compute the prior statistics by solving stochastic equations governing power grid dynamics. The short-term forecast of a power grid system dominated by wind generation is complicated by the stochastic nature of the wind and the resulting uncertain mechanical wind power. Here, we assume that the power-grid dynamic is governed by the swing equations, and we treat the unknown terms in the swing equations (specifically, the mechanical wind power) as random processes, which turns these equations into stochastic differential equations. We solve these equations for the mean and variance of the power grid system using the Monte Carlo simulations method. We demonstrate that the proposed PhI-GPR method can accurately forecast and estimate both observed and unobserved states, including the mean behavior and associated uncertainty. For observed states, we show that PhI-GPR provides a forecast comparable to the standard data-driven GPR, with both forecasts being significantly more accurate than the autoregressive integrated moving average (ARIMA) forecast. We also show that the ARIMA forecast is much more sensitive to observation frequency and measurement errors than the PhI-GPR forecast.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Two-stage battery recharge scheduling and vehicle-charger assignment policy for dynamic electric dial-a-ride services
Authors:
Tai-Yu Ma
Abstract:
Coordinating the charging scheduling of electric vehicles for dynamic dial-a-ride services is challenging considering charging queuing delays and stochastic customer demand. We propose a new two-stage solution approach to handle dynamic vehicle charging scheduling to minimize the costs of daily charging operations of the fleet. The approach comprises two components: daily vehicle charging scheduli…
▽ More
Coordinating the charging scheduling of electric vehicles for dynamic dial-a-ride services is challenging considering charging queuing delays and stochastic customer demand. We propose a new two-stage solution approach to handle dynamic vehicle charging scheduling to minimize the costs of daily charging operations of the fleet. The approach comprises two components: daily vehicle charging scheduling and online vehicle-charger assignment. A new battery charge scheduling model is proposed to obtain the vehicle charging schedules by minimizing the costs of vehicle daily charging operations while satisfying vehicle driving needs to serve customers. In the second stage, an online vehicle-charger assignment model is developed to minimize the total vehicle idle time for charges by considering queuing delays at the level of chargers. An efficient Lagrangian relaxation algorithm is proposed to solve the large-scale vehicle-charger assignment problem with small optimality gaps. The approach is applied to a realistic dynamic dial-a-ride service case study in Luxembourg and compared with the nearest charging station charging policy and first-come-first-served minimum charging delay policy under different charging infrastructure scenarios. Our computational results show that the approach can achieve significant savings for the operator in terms of charging waiting times (-74.9%), charging times (-38.6%), and charged energy costs (-27.4%). A sensitivity analysis is conducted to evaluate the impact of the different model parameters, showing the scalability and robustness of the approach in a stochastic environment.
△ Less
Submitted 15 April, 2021; v1 submitted 4 October, 2020;
originally announced October 2020.
-
Visual-speech Synthesis of Exaggerated Corrective Feedback
Authors:
Yaohua Bu,
Weijun Li,
Tianyi Ma,
Shengqi Chen,
Jia Jia,
Kun Li,
Xiaobo Lu
Abstract:
To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT). The speech exaggeration is realized by an emphatic speech generation neural network based on Tacotron, while the visual exaggeration is accomplished by ADC Viseme Blend…
▽ More
To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT). The speech exaggeration is realized by an emphatic speech generation neural network based on Tacotron, while the visual exaggeration is accomplished by ADC Viseme Blending, namely increasing Amplitude of movement, extending the phone's Duration and enhancing the color Contrast. User studies show that exaggerated feedback outperforms non-exaggerated version on helping learners with pronunciation identification and pronunciation improvement.
△ Less
Submitted 15 December, 2020; v1 submitted 12 September, 2020;
originally announced September 2020.
-
Optimal fast charging station locations for electric ridesharing service with online vehicle-charging station assignment
Authors:
Tai-Yu Ma,
Simin Xie
Abstract:
Electrified shared mobility services need to handle charging infrastructure planning and manage their daily charging operations to minimize total charging operation time and cost. However, existing studies tend to address these problems separately. A new online vehicle-charging assignment model is proposed and integrated into the fast charging location problem for dynamic ridesharing services usin…
▽ More
Electrified shared mobility services need to handle charging infrastructure planning and manage their daily charging operations to minimize total charging operation time and cost. However, existing studies tend to address these problems separately. A new online vehicle-charging assignment model is proposed and integrated into the fast charging location problem for dynamic ridesharing services using electric vehicles. The latter is formulated as a bi-level optimization problem to minimize the fleet's daily charging operation time. A surrogate-assisted optimization approach is proposed to solve the combinatorial optimization problem efficiently. The proposed model is tested on a realistic flexible bus service in Luxembourg. The results show that the proposed online charging policy can effectively reduce the charging delays of the fleet compared to the state-of-the-art methods. With 10 additional DC fast chargers installed, charging operation time can be reduced up to 27.8% when applying the online charging policy under the test scenarios.
△ Less
Submitted 12 October, 2020; v1 submitted 13 August, 2020;
originally announced August 2020.
-
Multistream CNN for Robust Acoustic Modeling
Authors:
Kyu J. Han,
Jing Pan,
Venkata Krishna Naveen Tadala,
Tao Ma,
Dan Povey
Abstract:
This paper proposes multistream CNN, a novel neural network architecture for robust acoustic modeling in speech recognition tasks. The proposed architecture processes input speech with diverse temporal resolutions by applying different dilation rates to convolutional neural networks across multiple streams to achieve the robustness. The dilation rates are selected from the multiples of a sub-sampl…
▽ More
This paper proposes multistream CNN, a novel neural network architecture for robust acoustic modeling in speech recognition tasks. The proposed architecture processes input speech with diverse temporal resolutions by applying different dilation rates to convolutional neural networks across multiple streams to achieve the robustness. The dilation rates are selected from the multiples of a sub-sampling rate of 3 frames. Each stream stacks TDNN-F layers (a variant of 1D CNN), and output embedding vectors from the streams are concatenated then projected to the final layer. We validate the effectiveness of the proposed multistream CNN architecture by showing consistent improvements against Kaldi's best TDNN-F model across various data sets. Multistream CNN improves the WER of the test-other set in the LibriSpeech corpus by 12% (relative). On custom data from ASAPP's production ASR system for a contact center, it records a relative WER improvement of 11% for customer channel audio to prove its robustness to data in the wild. In terms of real-time factor, multistream CNN outperforms the baseline TDNN-F by 15%, which also suggests its practicality on production systems. When combined with self-attentive SRU LM rescoring, multistream CNN contributes for ASAPP to achieve the best WER of 1.75% on test-clean in LibriSpeech.
△ Less
Submitted 25 April, 2021; v1 submitted 21 May, 2020;
originally announced May 2020.