Search | arXiv e-print repository

Comparative Study of Data-driven Area Inertia Estimation Approaches on WECC Power Systems

Authors: Bendong Tan, Jiangkai Peng, Ningchao Gao, Junbo Zhao, Jin Tan

Abstract: With the increasing integration of inverter-based resources into the power grid, there has been a notable reduction in system inertia, potentially compromising frequency stability. To assess the suitability of existing area inertia estimation techniques for real-world power systems, this paper presents a rigorous comparative analysis of system identification, measurement reconstruction, and electr… ▽ More With the increasing integration of inverter-based resources into the power grid, there has been a notable reduction in system inertia, potentially compromising frequency stability. To assess the suitability of existing area inertia estimation techniques for real-world power systems, this paper presents a rigorous comparative analysis of system identification, measurement reconstruction, and electromechanical oscillation-based area inertia estimation methodologies, specifically applied to the large-scale and multi-area WECC 240-bus power system. Comprehensive results show that the system identification-based approach exhibits superior robustness and accuracy relative to its counterparts. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2406.12268 [pdf, ps, other]

Channel Twinning: An Enabler for Next-Generation Ubiquitous Wireless Connectivity

Authors: Yashuai Cao, Jingbo Tan, Jintao Wang, Wei Ni, Ekram Hossain, Dusit Niyato

Abstract: The emerging concept of channel twinning (CT) has great potential to become a key enabler of ubiquitous connectivity in next-generation (xG) wireless systems. By fusing multimodal sensor data, CT advocates a high-fidelity and low-overhead channel acquisition paradigm, which is promising to provide accurate channel prediction in cross-domain and high-mobility scenarios of ubiquitous xG networks. Ho… ▽ More The emerging concept of channel twinning (CT) has great potential to become a key enabler of ubiquitous connectivity in next-generation (xG) wireless systems. By fusing multimodal sensor data, CT advocates a high-fidelity and low-overhead channel acquisition paradigm, which is promising to provide accurate channel prediction in cross-domain and high-mobility scenarios of ubiquitous xG networks. However, the current literature lacks a universal CT architecture to address the challenges of heterogeneous scenarios, data, and resources in xG networks, which hinders the widespread deployment and applications of CT. This article discusses a new modularized CT architecture to bridge the barriers to scene recognition, cooperative sensing, and decentralized training. Based on the modularized design of CT, universal channel modeling, multimodal cooperative sensing, and lightweight twin modeling are described. Moreover, this article provides a concise definition, technical features, and case studies of CT, followed by potential applications of CT-empowered ubiquitous connectivity and some issues requiring future investigations. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: submitted to IEEE

arXiv:2406.08122 [pdf]

Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor

Authors: Yongjie Si, Yanxiong Li, Jialong Li, Jiaxin Tan, Qianhua He

Abstract: It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessi… ▽ More It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessions. Moreover, we propose a method using expandable dual-embedding extractor to solve it. The proposed model consists of an embedding extractor and an expandable classifier. The embedding extractor consists of a pretrained Audio Spectrogram Transformer (AST) and a finetuned AST. The expandable classifier consists of prototypes and each prototype represents a class. Experiments are conducted on three datasets (LS-100, NSynth-100 and FSC-89). Results show that our method exceeds seven baseline ones in average accuracy with statistical significance. Code is at: https://github.com/YongjieSi/EDE. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted for publication on Interspeech 2024. 5 pages, 3 figures, 5 tables

arXiv:2406.08119 [pdf]

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

Authors: Yanxiong Li, Jiaxin Tan, Guoqing Chen, Jialong Li, Yongjie Si, Qianhua He

Abstract: This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextu… ▽ More This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextual information from each audio clip. In addition, we integrate other techniques into our method, such as knowledge distillation, data augmentation, and adaptive residual normalization. When evaluated on the official dataset of DCASE2023 challenge, our method obtains the highest accuracy of 56.10% with parameter number of 5.21 kilo and multiply-accumulate operations of 1.44 million. It exceeds the top two systems of DCASE2023 challenge in accuracy and complexity, and obtains state-of-the-art result. Code is at: https://github.com/Jessytan/Low-complexity-ASC. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted for publication on Interspeech 2024. 5 pages, 4 figures, 3 tables

arXiv:2406.03899 [pdf, other]

PLDNet: PLD-Guided Lightweight Deep Network Boosted by Efficient Attention for Handheld Dual-Microphone Speech Enhancement

Authors: Nan Zhou, Youhai Jiang, Jialin Tan, Chongmin Qi

Abstract: Low-complexity speech enhancement on mobile phones is crucial in the era of 5G. Thus, focusing on handheld mobile phone communication scenario, based on power level difference (PLD) algorithm and lightweight U-Net, we propose PLD-guided lightweight deep network (PLDNet), an extremely lightweight dual-microphone speech enhancement method that integrates the guidance of signal processing algorithm a… ▽ More Low-complexity speech enhancement on mobile phones is crucial in the era of 5G. Thus, focusing on handheld mobile phone communication scenario, based on power level difference (PLD) algorithm and lightweight U-Net, we propose PLD-guided lightweight deep network (PLDNet), an extremely lightweight dual-microphone speech enhancement method that integrates the guidance of signal processing algorithm and lightweight attention-augmented U-Net. For the guidance information, we employ PLD algorithm to pre-process dual-microphone spectrum, and feed the output into subsequent deep neural network, which utilizes a lightweight U-Net with our proposed gated convolution augmented frequency attention (GCAFA) module to extract desired clean speech. Experimental results demonstrate that our proposed method achieves competitive performance with recent top-performing models while reducing computational cost by over 90%, highlighting the potential for low-complexity speech enhancement on mobile phones. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech 2024

arXiv:2403.14345 [pdf, other]

Modem Optimization of High-Mobility Scenarios: A Deep-Learning-Inspired Approach

Authors: Hengyu Zhang, Xuehan Wang, Jingbo Tan, Jintao Wang

Abstract: The next generation wireless communication networks are required to support high-mobility scenarios, such as reliable data transmission for high-speed railways. Nevertheless, widely utilized multi-carrier modulation, the orthogonal frequency division multiplex (OFDM), cannot deal with the severe Doppler spread brought by high mobility. To address this problem, some new modulation schemes, e.g. ort… ▽ More The next generation wireless communication networks are required to support high-mobility scenarios, such as reliable data transmission for high-speed railways. Nevertheless, widely utilized multi-carrier modulation, the orthogonal frequency division multiplex (OFDM), cannot deal with the severe Doppler spread brought by high mobility. To address this problem, some new modulation schemes, e.g. orthogonal time frequency space and affine frequency division multiplexing, have been proposed with different design criteria from OFDM, which promote reliability with the cost of extremely high implementation complexity. On the other hand, end-to-end systems achieve excellent gains by exploiting neural networks to replace traditional transmitters and receivers, but have to retrain and update continually with channel varying. In this paper, we propose the Modem Network (ModNet) to design a novel modem scheme. Compared with end-to-end systems, channels are directly fed into the network and we can directly get a modem scheme through ModNet. Then, the Tri-Phase training strategy is proposed, which mainly utilizes the siamese structure to unify the learned modem scheme without retraining frequently faced up with time-varying channels. Simulation results show the proposed modem scheme outperforms OFDM systems under different highmobility channel statistics. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 figures, accepted by ICC 2024 Workshop - APATN

arXiv:2403.12506 [pdf, ps, other]

Sparse Estimation for XL-MIMO with Unified LoS/NLoS Representation

Authors: Xu Shi, Xuehan Wang, Jingbo Tan, Jintao Wang

Abstract: Extremely large-scale antenna array (ELAA) is promising as one of the key ingredients for the sixth generation (6G) of wireless communications. The electromagnetic propagation of spherical wavefronts introduces an additional distance-dependent dimension beyond conventional beamspace. In this paper, we first present one concise closed-form channel formulation for extremely large-scale multiple-inpu… ▽ More Extremely large-scale antenna array (ELAA) is promising as one of the key ingredients for the sixth generation (6G) of wireless communications. The electromagnetic propagation of spherical wavefronts introduces an additional distance-dependent dimension beyond conventional beamspace. In this paper, we first present one concise closed-form channel formulation for extremely large-scale multiple-input multiple-output (XL-MIMO). All line-of-sight (LoS) and non-line-of-sight (NLoS) paths, far-field and near-field scenarios, and XL-MIMO and XL-MISO channels are unified under the framework, where additional Vandermonde windowing matrix is exclusively considered for LoS path. Under this framework, we further propose one low-complexity unified LoS/NLoS orthogonal matching pursuit (XL-UOMP) algorithm for XL-MIMO channel estimation. The simulation results demonstrate the superiority of the proposed algorithm on both estimation accuracy and pilot consumption. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: ICC 2024

arXiv:2401.11449 [pdf, other]

doi 10.3390/s24020533

Energy Consumption Analysis for Continuous Phase Modulation in Smart-Grid Internet of Things of beyond 5G

Authors: Hongjian Gao, Yang Lu, Shaoshi Yang, Jingsheng Tan, Longlong Nie, Xinyi Qu

Abstract: Wireless sensor network (WSN) underpinning the smart-grid Internet of Things (SG-IoT) has been a popular research topic in recent years due to its great potential for enabling a wide range of important applications. However, the energy consumption (EC) characteristic of sensor nodes is a key factor that affects the operational performance (e.g., lifetime of sensors) and the total cost of ownership… ▽ More Wireless sensor network (WSN) underpinning the smart-grid Internet of Things (SG-IoT) has been a popular research topic in recent years due to its great potential for enabling a wide range of important applications. However, the energy consumption (EC) characteristic of sensor nodes is a key factor that affects the operational performance (e.g., lifetime of sensors) and the total cost of ownership of WSNs. In this paper, to find the modulation techniques suitable for WSNs, we investigate the EC characteristic of continuous phase modulation (CPM), which is an attractive modulation scheme candidate for WSNs because of its constant envelope property. We first develop an EC model for the sensor nodes of WSNs by considering the circuits and a typical communication protocol that relies on automatic repeat request (ARQ)-based retransmissions to ensure successful data delivery. Then, we use this model to analyze the EC characteristic of CPM under various configurations of modulation parameters. Furthermore, we compare the EC characteristic of CPM with that of other representative modulation schemes, such as offset quadrature phase-shift keying (OQPSK) and quadrature amplitude modulation (QAM), which are commonly used in communication protocols of WSNs. Our analysis and simulation results provide insights into the EC characteristics of multiple modulation schemes in the context of WSNs; thus, they are beneficial for designing energy-efficient SG-IoT in the beyond-5G (B5G) and the 6G era. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 7 figures, 2 tables

Journal ref: Sensors, vol. 24, no. 2, pp. 1-14, article number 533, Jan. 2024

arXiv:2312.16884 [pdf]

Binaural recording methods with analysis on inter-aural time, level, and phase differences

Authors: Johann Kay Ann Tan

Abstract: Binaural recordings are a form of stereophonic recording method that replicates how human ears perceive sound, these types of recordings create a 3D aural image around the listener and are extremely immersive when well recorded and listened to appropriately with headphones. It has wide applications in video, podcast, and gaming formats -- allowing the listener to feel like they are there. Although… ▽ More Binaural recordings are a form of stereophonic recording method that replicates how human ears perceive sound, these types of recordings create a 3D aural image around the listener and are extremely immersive when well recorded and listened to appropriately with headphones. It has wide applications in video, podcast, and gaming formats -- allowing the listener to feel like they are there. Although binaural formats are seldom used for music applications, they have also been utilized in music ranging from Rock, Jazz, Acoustic, and Classical. In this paper, we will investigate the acoustical phenomenon that produces the binaural effect in audio recordings -- including the ITD (Inter-aural time difference), the ILD (inter-aural level difference), IPD (inter-aural phase difference) as well as the monaural spectral difference that occurs between two ears so we can better understand the replication of human hearing in binaural recordings. Binaural recordings differ from regular stereophonic recordings as they are arranged in a specific way to account for HRTF (Head-related transfer function). The most common method of binaural recordings is with two high-quality omni-directional microphones affixed on a dummy head where the ears are located, although other methods exist without the use of a full dummy head. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2310.19113 [pdf, other]

Dynamic V2X Autonomous Perception from Road-to-Vehicle Vision

Authors: Jiayao Tan, Fan Lyu, Linyan Li, Fuyuan Hu, Tingliang Feng, Fenglei Xu, Rui Yao

Abstract: Vehicle-to-everything (V2X) perception is an innovative technology that enhances vehicle perception accuracy, thereby elevating the security and reliability of autonomous systems. However, existing V2X perception methods focus on static scenes from mainly vehicle-based vision, which is constrained by sensor capabilities and communication loads. To adapt V2X perception models to dynamic scenes, we… ▽ More Vehicle-to-everything (V2X) perception is an innovative technology that enhances vehicle perception accuracy, thereby elevating the security and reliability of autonomous systems. However, existing V2X perception methods focus on static scenes from mainly vehicle-based vision, which is constrained by sensor capabilities and communication loads. To adapt V2X perception models to dynamic scenes, we propose to build V2X perception from road-to-vehicle vision and present Adaptive Road-to-Vehicle Perception (AR2VP) method. In AR2VP,we leverage roadside units to offer stable, wide-range sensing capabilities and serve as communication hubs. AR2VP is devised to tackle both intra-scene and inter-scene changes. For the former, we construct a dynamic perception representing module, which efficiently integrates vehicle perceptions, enabling vehicles to capture a more comprehensive range of dynamic factors within the scene.Moreover, we introduce a road-to-vehicle perception compensating module, aimed at preserving the maximized roadside unit perception information in the presence of intra-scene changes.For inter-scene changes, we implement an experience replay mechanism leveraging the roadside unit's storage capacity to retain a subset of historical scene data, maintaining model robustness in response to inter-scene shifts. We conduct perception experiment on 3D object detection and segmentation, and the results show that AR2VP excels in both performance-bandwidth trade-offs and adaptability within dynamic environments. △ Less

Submitted 29 October, 2023; originally announced October 2023.

arXiv:2310.14934 [pdf]

Robust Depth Linear Error Decomposition with Double Total Variation and Nuclear Norm for Dynamic MRI Reconstruction

Authors: Junpeng Tan, Chunmei Qing, Xiangmin Xu

Abstract: Compressed Sensing (CS) significantly speeds up Magnetic Resonance Image (MRI) processing and achieves accurate MRI reconstruction from under-sampled k-space data. According to the current research, there are still several problems with dynamic MRI k-space reconstruction based on CS. 1) There are differences between the Fourier domain and the Image domain, and the differences between MRI processin… ▽ More Compressed Sensing (CS) significantly speeds up Magnetic Resonance Image (MRI) processing and achieves accurate MRI reconstruction from under-sampled k-space data. According to the current research, there are still several problems with dynamic MRI k-space reconstruction based on CS. 1) There are differences between the Fourier domain and the Image domain, and the differences between MRI processing of different domains need to be considered. 2) As three-dimensional data, dynamic MRI has its spatial-temporal characteristics, which need to calculate the difference and consistency of surface textures while preserving structural integrity and uniqueness. 3) Dynamic MRI reconstruction is time-consuming and computationally resource-dependent. In this paper, we propose a novel robust low-rank dynamic MRI reconstruction optimization model via highly under-sampled and Discrete Fourier Transform (DFT) called the Robust Depth Linear Error Decomposition Model (RDLEDM). Our method mainly includes linear decomposition, double Total Variation (TV), and double Nuclear Norm (NN) regularizations. By adding linear image domain error analysis, the noise is reduced after under-sampled and DFT processing, and the anti-interference ability of the algorithm is enhanced. Double TV and NN regularizations can utilize both spatial-temporal characteristics and explore the complementary relationship between different dimensions in dynamic MRI sequences. In addition, Due to the non-smoothness and non-convexity of TV and NN terms, it is difficult to optimize the unified objective model. To address this issue, we utilize a fast algorithm by solving a primal-dual form of the original problem. Compared with five state-of-the-art methods, extensive experiments on dynamic MRI data demonstrate the superior performance of the proposed method in terms of both reconstruction accuracy and time complexity. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.10209 [pdf, other]

Self-supervised Fetal MRI 3D Reconstruction Based on Radiation Diffusion Generation Model

Authors: Junpeng Tan, Xin Zhang, Yao Lv, Xiangmin Xu, Gang Li

Abstract: Although the use of multiple stacks can handle slice-to-volume motion correction and artifact removal problems, there are still several problems: 1) The slice-to-volume method usually uses slices as input, which cannot solve the problem of uniform intensity distribution and complementarity in regions of different fetal MRI stacks; 2) The integrity of 3D space is not considered, which adversely aff… ▽ More Although the use of multiple stacks can handle slice-to-volume motion correction and artifact removal problems, there are still several problems: 1) The slice-to-volume method usually uses slices as input, which cannot solve the problem of uniform intensity distribution and complementarity in regions of different fetal MRI stacks; 2) The integrity of 3D space is not considered, which adversely affects the discrimination and generation of globally consistent information in fetal MRI; 3) Fetal MRI with severe motion artifacts in the real-world cannot achieve high-quality super-resolution reconstruction. To address these issues, we propose a novel fetal brain MRI high-quality volume reconstruction method, called the Radiation Diffusion Generation Model (RDGM). It is a self-supervised generation method, which incorporates the idea of Neural Radiation Field (NeRF) based on the coordinate generation and diffusion model based on super-resolution generation. To solve regional intensity heterogeneity in different directions, we use a pre-trained transformer model for slice registration, and then, a new regionally Consistent Implicit Neural Representation (CINR) network sub-module is proposed. CINR can generate the initial volume by combining a coordinate association map of two different coordinate mapping spaces. To enhance volume global consistency and discrimination, we introduce the Volume Diffusion Super-resolution Generation (VDSG) mechanism. The global intensity discriminant generation from volume-to-volume is carried out using the idea of diffusion generation, and CINR becomes the deviation intensity generation network of the volume-to-volume diffusion model. Finally, the experimental results on real-world fetal brain MRI stacks demonstrate the state-of-the-art performance of our method. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.07464 [pdf]

Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma

Authors: Zijie Fang, Yihan Liu, Yifeng Wang, Xiangyang Zhang, Yang Chen, Changjing Cai, Yiyang Lin, Ying Han, Zhi Wang, Shan Zeng, Hong Shen, Jun Tan, Yongbing Zhang

Abstract: Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a… ▽ More Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a Multi-Biomarker Histomorphology Discoverer (Multi-Beholder) model based on the multiple instance learning (MIL) framework, to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Specifically, by incorporating the one-class classification into the MIL framework, accurate instance pseudo-labeling is realized for instance-level supervision, which greatly complements the slide-level labels and improves the biomarker prediction performance. Multi-Beholder demonstrates superior prediction performance and generalizability for five LGG biomarkers (AUROC=0.6469-0.9735) in two cohorts (n=607) with diverse races and scanning protocols. Moreover, the excellent interpretability of Multi-Beholder allows for discovering the quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 47 pages, 6 figures

arXiv:2310.05938 [pdf, other]

doi 10.1145/3577190.3614114

Component attention network for multimodal dance improvisation recognition

Authors: Jia Fu, Jiarui Tan, Wenjie Yin, Sepideh Pashami, Mårten Björkman

Abstract: Dance improvisation is an active research topic in the arts. Motion analysis of improvised dance can be challenging due to its unique dynamics. Data-driven dance motion analysis, including recognition and generation, is often limited to skeletal data. However, data of other modalities, such as audio, can be recorded and benefit downstream tasks. This paper explores the application and performance… ▽ More Dance improvisation is an active research topic in the arts. Motion analysis of improvised dance can be challenging due to its unique dynamics. Data-driven dance motion analysis, including recognition and generation, is often limited to skeletal data. However, data of other modalities, such as audio, can be recorded and benefit downstream tasks. This paper explores the application and performance of multimodal fusion methods for human motion recognition in the context of dance improvisation. We propose an attention-based model, component attention network (CANet), for multimodal fusion on three levels: 1) feature fusion with CANet, 2) model fusion with CANet and graph convolutional network (GCN), and 3) late fusion with a voting strategy. We conduct thorough experiments to analyze the impact of each modality in different fusion methods and distinguish critical temporal or component features. We show that our proposed model outperforms the two baseline methods, demonstrating its potential for analyzing improvisation in dance. △ Less

Submitted 24 August, 2023; originally announced October 2023.

Comments: Accepted to 25th ACM International Conference on Multimodal Interaction (ICMI 2023)

ACM Class: I.2; I.5.4

arXiv:2310.05021 [pdf, other]

Toward Intelligent Emergency Control for Large-scale Power Systems: Convergence of Learning, Physics, Computing and Control

Authors: Qiuhua Huang, Renke Huang, Tianzhixi Yin, Sohom Datta, Xueqing Sun, Jason Hou, Jie Tan, Wenhao Yu, Yuan Liu, Xinya Li, Bruce Palmer, Ang Li, Xinda Ke, Marianna Vaiman, Song Wang, Yousu Chen

Abstract: This paper has delved into the pressing need for intelligent emergency control in large-scale power systems, which are experiencing significant transformations and are operating closer to their limits with more uncertainties. Learning-based control methods are promising and have shown effectiveness for intelligent power system control. However, when they are applied to large-scale power systems, t… ▽ More This paper has delved into the pressing need for intelligent emergency control in large-scale power systems, which are experiencing significant transformations and are operating closer to their limits with more uncertainties. Learning-based control methods are promising and have shown effectiveness for intelligent power system control. However, when they are applied to large-scale power systems, there are multifaceted challenges such as scalability, adaptiveness, and security posed by the complex power system landscape, which demand comprehensive solutions. The paper first proposes and instantiates a convergence framework for integrating power systems physics, machine learning, advanced computing, and grid control to realize intelligent grid control at a large scale. Our developed methods and platform based on the convergence framework have been applied to a large (more than 3000 buses) Texas power system, and tested with 56000 scenarios. Our work achieved a 26% reduction in load shedding on average and outperformed existing rule-based control in 99.7% of the test scenarios. The results demonstrated the potential of the proposed convergence framework and DRL-based intelligent control for the future grid. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: submitted to PSCC 2024

arXiv:2308.13002 [pdf, other]

Head-Neck Dual-energy CT Contrast Media Reduction Using Diffusion Models

Authors: Qing Lyu, Josh Tan, Megan E. Lipford, Chuang Niu, Micheal E. Zapadka, Christopher M. Lack, Jonathan D. Clemente, Christopher T. Whitlow, Ge Wang

Abstract: Iodinated contrast media is essential for dual-energy computed tomography (DECT) angiography. Previous studies show that iodinated contrast media may cause side effects, and the interruption of the supply chain in 2022 led to a severe contrast media shortage in the US. Both factors justify the necessity of contrast media reduction in relevant clinical applications. In this study, we propose a diff… ▽ More Iodinated contrast media is essential for dual-energy computed tomography (DECT) angiography. Previous studies show that iodinated contrast media may cause side effects, and the interruption of the supply chain in 2022 led to a severe contrast media shortage in the US. Both factors justify the necessity of contrast media reduction in relevant clinical applications. In this study, we propose a diffusion model-based deep learning framework to address this challenge. First, we simulate different levels of low contrast dosage DECT scans from the standard normal contrast dosage DECT scans using material decomposition. Conditional denoising diffusion probabilistic models are then trained to enhance the contrast media and create contrast-enhanced images. Our results demonstrate that the proposed methods can generate high-quality contrast-enhanced results even for images obtained with as low as 12.5% of the normal contrast dosage. Furthermore, our method outperforms selected competing methods in a human reader study. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.10196 [pdf, other]

Blind Face Restoration for Under-Display Camera via Dictionary Guided Transformer

Authors: Jingfan Tan, Xiaoxu Chen, Tao Wang, Kaihao Zhang, Wenhan Luo, Xiaocun Cao

Abstract: By hiding the front-facing camera below the display panel, Under-Display Camera (UDC) provides users with a full-screen experience. However, due to the characteristics of the display, images taken by UDC suffer from significant quality degradation. Methods have been proposed to tackle UDC image restoration and advances have been achieved. There are still no specialized methods and datasets for res… ▽ More By hiding the front-facing camera below the display panel, Under-Display Camera (UDC) provides users with a full-screen experience. However, due to the characteristics of the display, images taken by UDC suffer from significant quality degradation. Methods have been proposed to tackle UDC image restoration and advances have been achieved. There are still no specialized methods and datasets for restoring UDC face images, which may be the most common problem in the UDC scene. To this end, considering color filtering, brightness attenuation, and diffraction in the imaging process of UDC, we propose a two-stage network UDC Degradation Model Network named UDC-DMNet to synthesize UDC images by modeling the processes of UDC imaging. Then we use UDC-DMNet and high-quality face images from FFHQ and CelebA-Test to create UDC face training datasets FFHQ-P/T and testing datasets CelebA-Test-P/T for UDC face restoration. We propose a novel dictionary-guided transformer network named DGFormer. Introducing the facial component dictionary and the characteristics of the UDC image in the restoration makes DGFormer capable of addressing blind face restoration in UDC scenarios. Experiments show that our DGFormer and UDC-DMNet achieve state-of-the-art performance. △ Less

Submitted 1 December, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

Comments: To appear in IEEE TCSVT

arXiv:2307.12595 [pdf, ps, other]

Underlaid Sensing Pilot for Integrated Sensing and Communications

Authors: Pu Yuan, Hao Liu, Junjie Tan, Dajie Jiang, Lei Yan

Abstract: This paper investigates a novel underlaid sensing pilot signal design for integrated sensing and communications (ISAC) in an OFDM-based communication system. The proposed two-dimensional (2D) pilot signal is first generated on the delay-Doppler (DD) plane and then converted to the time-frequency (TF) plane for multiplexing with the OFDM data symbols. The sensing signal underlays the OFDM data, all… ▽ More This paper investigates a novel underlaid sensing pilot signal design for integrated sensing and communications (ISAC) in an OFDM-based communication system. The proposed two-dimensional (2D) pilot signal is first generated on the delay-Doppler (DD) plane and then converted to the time-frequency (TF) plane for multiplexing with the OFDM data symbols. The sensing signal underlays the OFDM data, allowing for the sharing of time-frequency resources. In this framework, sensing detection is implemented based on a simple 2D correlation, taking advantage of the favorable auto-correlation properties of the sensing pilot. In the communication part, the sensing pilot, served as a known signal, can be utilized for channel estimation and equalization to ensure optimal symbol detection performance. The underlaid sensing pilot demonstrates good scalability and can adapt to different delay and Doppler resolution requirements without violating the OFDM frame structure. Experimental results show the effective sensing performance of the proposed pilot, with only a small fraction of power shared from the OFDM data, while maintaining satisfactory symbol detection performance in communication. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 13 pages, 6 figures

arXiv:2304.02649 [pdf, other]

Specialty-Oriented Generalist Medical AI for Chest CT Screening

Authors: Chuang Niu, Qing Lyu, Christopher D. Carothers, Parisa Kaviani, Josh Tan, Pingkun Yan, Mannudeep K. Kalra, Christopher T. Whitlow, Ge Wang

Abstract: Modern medical records include a vast amount of multimodal free text clinical data and imaging data from radiology, cardiology, and digital pathology. Fully mining such big data requires multitasking; otherwise, occult but important aspects may be overlooked, adversely affecting clinical management and population healthcare. Despite remarkable successes of AI in individual tasks with single-modal… ▽ More Modern medical records include a vast amount of multimodal free text clinical data and imaging data from radiology, cardiology, and digital pathology. Fully mining such big data requires multitasking; otherwise, occult but important aspects may be overlooked, adversely affecting clinical management and population healthcare. Despite remarkable successes of AI in individual tasks with single-modal data, the progress in developing generalist medical AI remains relatively slow to combine multimodal data for multitasks because of the dual challenges of data curation and model architecture. The data challenge involves querying and curating multimodal structured and unstructured text, alphanumeric, and especially 3D tomographic scans on an individual patient level for real-time decisions and on a scale to estimate population health statistics. The model challenge demands a scalable and adaptable network architecture to integrate multimodal datasets for diverse clinical tasks. Here we propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with application in lung cancer screening and related tasks. After we curated a comprehensive multimodal multitask dataset consisting of 49 clinical data types including 163,725 chest CT series and 17 medical tasks involved in LCS, we develop a multimodal question-answering framework as a unified training and inference strategy to synergize multimodal information and perform multiple tasks via free-text prompting. M3FM consistently outperforms the state-of-the-art single-modal task-specific models, identifies multimodal data elements informative for clinical tasks and flexibly adapts to new tasks with a small out-of-distribution dataset. As a specialty-oriented generalist medical AI model, M3FM paves the way for similar breakthroughs in other areas of medicine, closing the gap between specialists and the generalist. △ Less

Submitted 24 April, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.13227 [pdf, other]

Confidence-Aware and Self-Supervised Image Anomaly Localisation

Authors: Johanna P. Müller, Matthew Baugh, Jeremy Tan, Mischa Dombrowski, Bernhard Kainz

Abstract: Universal anomaly detection still remains a challenging problem in machine learning and medical image analysis. It is possible to learn an expected distribution from a single class of normative samples, e.g., through epistemic uncertainty estimates, auto-encoding models, or from synthetic anomalies in a self-supervised way. The performance of self-supervised anomaly detection approaches is still i… ▽ More Universal anomaly detection still remains a challenging problem in machine learning and medical image analysis. It is possible to learn an expected distribution from a single class of normative samples, e.g., through epistemic uncertainty estimates, auto-encoding models, or from synthetic anomalies in a self-supervised way. The performance of self-supervised anomaly detection approaches is still inferior compared to methods that use examples from known unknown classes to shape the decision boundary. However, outlier exposure methods often do not identify unknown unknowns. Here we discuss an improved self-supervised single-class training strategy that supports the approximation of probabilistic inference with loosen feature locality constraints. We show that up-scaling of gradients with histogram-equalised images is beneficial for recently proposed self-supervision tasks. Our method is integrated into several out-of-distribution (OOD) detection models and we show evidence that our method outperforms the state-of-the-art on various benchmark datasets. △ Less

Submitted 2 October, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted for MICCAI UNSURE Workshop 2023 (Spotlight)

arXiv:2301.05781 [pdf, other]

Analysis of November 21, 2021, Kaua`i Island Power System 18-20 Hz Oscillations

Authors: Shuan Dong, Bin Wang, Jin Tan, Cameron J. Kruse, Brad W. Rockwell, Anderson Hoke

Abstract: This letter discusses the 18-20 Hz oscillation event at 05:30 am on November 21, 2021, in Kaua`i's power system following the trip of an oil power plant. As far as the authors are aware, this is the first report of a transmission system-wide subsynchronous oscillation driven by inverter-based resources (though the system in question is relatively small). In this letter, we leverage two data-based… ▽ More This letter discusses the 18-20 Hz oscillation event at 05:30 am on November 21, 2021, in Kaua`i's power system following the trip of an oil power plant. As far as the authors are aware, this is the first report of a transmission system-wide subsynchronous oscillation driven by inverter-based resources (though the system in question is relatively small). In this letter, we leverage two data-based methods-the dissipating energy flow method and the sub/super-synchronous power flow method-to locate the sources of the oscillation. Also, we build an electromagnetic transient model of the Kaua`i power system and replay the 18-20 Hz oscillation. Finally, we propose two mitigation methods and validate their effectiveness via numerical simulation. △ Less

Submitted 10 February, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

arXiv:2301.05187 [pdf, other]

WIRE: Wavelet Implicit Neural Representations

Authors: Vishwanath Saragadam, Daniel LeJeune, Jasper Tan, Guha Balakrishnan, Ashok Veeraraghavan, Richard G. Baraniuk

Abstract: Implicit neural representations (INRs) have recently advanced numerous vision-related areas. INR performance depends strongly on the choice of the nonlinear activation function employed in its multilayer perceptron (MLP) network. A wide range of nonlinearities have been explored, but, unfortunately, current INRs designed to have high accuracy also suffer from poor robustness (to signal noise, para… ▽ More Implicit neural representations (INRs) have recently advanced numerous vision-related areas. INR performance depends strongly on the choice of the nonlinear activation function employed in its multilayer perceptron (MLP) network. A wide range of nonlinearities have been explored, but, unfortunately, current INRs designed to have high accuracy also suffer from poor robustness (to signal noise, parameter variation, etc.). Inspired by harmonic analysis, we develop a new, highly accurate and robust INR that does not exhibit this tradeoff. Wavelet Implicit neural REpresentation (WIRE) uses a continuous complex Gabor wavelet activation function that is well-known to be optimally concentrated in space-frequency and to have excellent biases for representing images. A wide range of experiments (image denoising, image inpainting, super-resolution, computed tomography reconstruction, image overfitting, and novel view synthesis with neural radiance fields) demonstrate that WIRE defines the new state of the art in INR accuracy, training time, and robustness. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2212.02715 [pdf, other]

Efficient Learning of Voltage Control Strategies via Model-based Deep Reinforcement Learning

Authors: Ramij R. Hossain, Tianzhixi Yin, Yan Du, Renke Huang, Jie Tan, Wenhao Yu, Yuan Liu, Qiuhua Huang

Abstract: This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems. Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time, both critical for making state-of-the-art DRL algorithms… ▽ More This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems. Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time, both critical for making state-of-the-art DRL algorithms practically applicable. DRL-agent learns an optimal policy via a trial-and-error method while interacting with the real-world environment. And it is desirable to minimize the direct interaction of the DRL agent with the real-world power grid due to its safety-critical nature. Additionally, state-of-the-art DRL-based policies are mostly trained using a physics-based grid simulator where dynamic simulation is computationally intensive, lowering the training efficiency. We propose a novel model-based-DRL framework where a deep neural network (DNN)-based dynamic surrogate model, instead of a real-world power-grid or physics-based simulation, is utilized with the policy learning framework, making the process faster and sample efficient. However, stabilizing model-based DRL is challenging because of the complex system dynamics of large-scale power systems. We solved these issues by incorporating imitation learning to have a warm start in policy learning, reward-shaping, and multi-step surrogate loss. Finally, we achieved 97.5% sample efficiency and 87.7% training efficiency for an application to the IEEE 300-bus test system. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2210.10865 [pdf, other]

Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization

Authors: Thomas Lew, Sumeet Singh, Mario Prats, Jeffrey Bingham, Jonathan Weisz, Benjie Holson, Xiaohan Zhang, Vikas Sindhwani, Yao Lu, Fei Xia, Peng Xu, Tingnan Zhang, Jie Tan, Montserrat Gonzalez

Abstract: We propose a framework to enable multipurpose assistive mobile robots to autonomously wipe tables to clean spills and crumbs. This problem is challenging, as it requires planning wiping actions while reasoning over uncertain latent dynamics of crumbs and spills captured via high-dimensional visual observations. Simultaneously, we must guarantee constraints satisfaction to enable safe deployment in… ▽ More We propose a framework to enable multipurpose assistive mobile robots to autonomously wipe tables to clean spills and crumbs. This problem is challenging, as it requires planning wiping actions while reasoning over uncertain latent dynamics of crumbs and spills captured via high-dimensional visual observations. Simultaneously, we must guarantee constraints satisfaction to enable safe deployment in unstructured cluttered environments. To tackle this problem, we first propose a stochastic differential equation to model crumbs and spill dynamics and absorption with a robot wiper. Using this model, we train a vision-based policy for planning wiping actions in simulation using reinforcement learning (RL). To enable zero-shot sim-to-real deployment, we dovetail the RL policy with a whole-body trajectory optimization framework to compute base and arm joint trajectories that execute the desired wiping motions while guaranteeing constraints satisfaction. We extensively validate our approach in simulation and on hardware. Video: https://youtu.be/inORKP4F3EI △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2209.12305 [pdf, other]

Adnexal Mass Segmentation with Ultrasound Data Synthesis

Authors: Clara Lebbos, Jen Barcroft, Jeremy Tan, Johanna P. Muller, Matthew Baugh, Athanasios Vlontzos, Srdjan Saso, Bernhard Kainz

Abstract: Ovarian cancer is the most lethal gynaecological malignancy. The disease is most commonly asymptomatic at its early stages and its diagnosis relies on expert evaluation of transvaginal ultrasound images. Ultrasound is the first-line imaging modality for characterising adnexal masses, it requires significant expertise and its analysis is subjective and labour-intensive, therefore open to error. Hen… ▽ More Ovarian cancer is the most lethal gynaecological malignancy. The disease is most commonly asymptomatic at its early stages and its diagnosis relies on expert evaluation of transvaginal ultrasound images. Ultrasound is the first-line imaging modality for characterising adnexal masses, it requires significant expertise and its analysis is subjective and labour-intensive, therefore open to error. Hence, automating processes to facilitate and standardise the evaluation of scans is desired in clinical practice. Using supervised learning, we have demonstrated that segmentation of adnexal masses is possible, however, prevalence and label imbalance restricts the performance on under-represented classes. To mitigate this we apply a novel pathology-specific data synthesiser. We create synthetic medical images with their corresponding ground truth segmentations by using Poisson image editing to integrate less common masses into other samples. Our approach achieves the best performance across all classes, including an improvement of up to 8% when compared with nnU-Net baseline approaches. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Journal ref: ASMUS 2022, LNCS 13565, p. 106, 2022

arXiv:2209.09413 [pdf, other]

A Unified Analytical Method to Quantify Three Types of Fast Frequency Response from Inverter-based Resources

Authors: Shuan Dong, Xin Fang, Jin Tan, Ningchao Gao, Xiaofan Cui, Anderson Hoke

Abstract: With more inverter-based resources (IBRs), our power systems have lower frequency nadirs following N-1 contingencies, and undesired under-frequency load shedding (UFLS) can occur. To address this challenge, IBRs can be programmed to provide at least three types of fast frequency response (FFR), e.g., step response, proportional response (P/f droop response), and derivative response (synthetic iner… ▽ More With more inverter-based resources (IBRs), our power systems have lower frequency nadirs following N-1 contingencies, and undesired under-frequency load shedding (UFLS) can occur. To address this challenge, IBRs can be programmed to provide at least three types of fast frequency response (FFR), e.g., step response, proportional response (P/f droop response), and derivative response (synthetic inertia). However, these heterogeneous FFR challenge the study of power system frequency dynamics. Thus, this paper develops an analytical frequency nadir prediction method that allows for the consideration of all three potential forms of FFR provided by IBRs. The proposed method provides fast and accurate frequency nadir estimation after N-1 generation tripping contingencies. Our method is grounded on the closed-form solution for the frequency nadir, which is solved from the second-order system frequency response model considering the governor dynamics and three types of FFR. The simulation results in the IEEE 39-bus system with different types of FFR demonstrate that the proposed method provides an accurate and fast prediction of the frequency nadir under various disturbances. △ Less

Submitted 25 August, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

arXiv:2205.14029 [pdf]

Lesion classification by model-based feature extraction: A differential affine invariant model of soft tissue elasticity

Authors: Weiguo Cao, Marc J. Pomeroy, Zhengrong Liang, Yongfeng Gao, Yongyi Shi, Jiaxing Tan, Fangfang Han, Jing Wang, Jianhua Ma, Hongbin Lu, Almas F. Abbasi, Perry J. Pickhardt

Abstract: The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomogra… ▽ More The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomography (CT) imaging modality for model-based feature extraction machine learning (ML) differentiation of lesions. The model describes a dynamic non-rigid (or elastic) deformation in differential manifold to mimic the soft tissues elasticity under wave fluctuation in vivo. Based on the model, three local deformation invariants are constructed by two tensors defined by the first and second order derivatives from the CT images and used to generate elastic feature maps after normalization via a novel signal suppression method. The model-based elastic image features are extracted from the feature maps and fed to machine learning to perform lesion classifications. Two pathologically proven image datasets of colon polyps (44 malignant and 43 benign) and lung nodules (46 malignant and 20 benign) were used to evaluate the proposed model-based lesion classification. The outcomes of this modeling approach reached the score of area under the curve of the receiver operating characteristics of 94.2 % for the polyps and 87.4 % for the nodules, resulting in an average gain of 5 % to 30 % over ten existing state-of-the-art lesion classification methods. The gains by modeling tissue elasticity for ML differentiation of lesions are striking, indicating the great potential of exploring the modeling strategy to other tissue properties for ML differentiation of lesions. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 12 pages, 4 figures, 3 tables

arXiv:2203.06959 [pdf, ps, other]

Data-Driven Robust Control for Discrete Linear Time-Invariant Systems: A Descriptor System Approach

Authors: Jiabao He, Xuan Zhang, Feng Xu, Junbo Tan, Xueqian Wang

Abstract: Given the recent surge of interest in data-driven control, this paper proposes a two-step method to study robust data-driven control for a parameter-unknown linear time-invariant (LTI) system that is affected by energy-bounded noises. First, two data experiments are designed and corresponding data are collected, then the investigated system is equivalently written into a data-based descriptor syst… ▽ More Given the recent surge of interest in data-driven control, this paper proposes a two-step method to study robust data-driven control for a parameter-unknown linear time-invariant (LTI) system that is affected by energy-bounded noises. First, two data experiments are designed and corresponding data are collected, then the investigated system is equivalently written into a data-based descriptor system with structured parametric uncertainties. Second, combined with model-based control theory for descriptor systems, state feedback controllers are designed for such data-based descriptor system, which stabilize the original LTI system and guarantee the ${H_\infty}$ performance. Finally, a simulation example is provided to illustrate the effectiveness and merits of our method. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2201.08731 [pdf, other]

doi 10.23919/URSIGASS51995.2021.9560360

Low-Interception Waveform: To Prevent the Recognition of Spectrum Waveform Modulation via Adversarial Examples

Authors: Haidong Xie, Jia Tan, Xiaoying Zhang, Nan Ji, Haihua Liao, Zuguo Yu, Xueshuang Xiang, Naijin Liu

Abstract: Deep learning is applied to many complex tasks in the field of wireless communication, such as modulation recognition of spectrum waveforms, because of its convenience and efficiency. This leads to the problem of a malicious third party using a deep learning model to easily recognize the modulation format of the transmitted waveform. Some existing works address this problem directly using the conc… ▽ More Deep learning is applied to many complex tasks in the field of wireless communication, such as modulation recognition of spectrum waveforms, because of its convenience and efficiency. This leads to the problem of a malicious third party using a deep learning model to easily recognize the modulation format of the transmitted waveform. Some existing works address this problem directly using the concept of adversarial examples in the image domain without fully considering the characteristics of the waveform transmission in the physical world. Therefore, we propose a low-intercept waveform~(LIW) generation method that can reduce the probability of the modulation being recognized by a third party without affecting the reliable communication of the friendly party. Our LIW exhibits significant low-interception performance even in the physical hardware experiment, decreasing the accuracy of the state of the art model to approximately $15\%$ with small perturbations. △ Less

Submitted 20 January, 2022; originally announced January 2022.

Comments: 4 pages, 4 figures, published in 2021 34th General Assembly and Scientific Symposium of the International Union of Radio Science, URSI GASS 2021

Journal ref: URSI GASS, 2021, pp. 1-4

arXiv:2112.14574 [pdf]

Industry 4.0: Challenges and success factors for adopting digital technologies in airports

Authors: Jia Hao Tan, Tariq Masood

Abstract: With the advent of Industry 4.0 technologies in the last decade, airports have undergone digitalisation to capitalise on the purported benefits of these technologies such as improved operational efficiency and passenger experience. The ongoing COVID-19 pandemic with emergence of its variants (e.g. Delta, Omicron) has exacerbated the need for airports to adopt new technologies such as contactless a… ▽ More With the advent of Industry 4.0 technologies in the last decade, airports have undergone digitalisation to capitalise on the purported benefits of these technologies such as improved operational efficiency and passenger experience. The ongoing COVID-19 pandemic with emergence of its variants (e.g. Delta, Omicron) has exacerbated the need for airports to adopt new technologies such as contactless and robotic technologies to facilitate travel during this pandemic. However, there is limited knowledge of recent challenges and success factors for adoption of digital technologies in airports. Therefore, through an industry survey of airport operators and managers around the world (n=102, 0.754<Composite Reliability<0.892; conducted during COVID-19), this study identifies the challenges faced in adopting Industry 4.0 technologies (n=20) as well as enhances understanding of best practices or success factors that supported technology adoption in airports. The widely used technology, organisation, environment (TOE) framework is used as a theoretically basis for the quantitative part of the questionnaire. A complementary qualitative part is used to underpin and extend the findings. The industry survey is the first-of-its-kind that was conducted to understand the implementation challenges that airport operators face in adopting Industry 4.0 technologies in the airport. The survey results have shown that that the Industry 4.0 technologies were not implemented to a similar extent in airports despite the generic challenges that were faced in adopting the various Industry 4.0 technologies in the airport. △ Less

Submitted 29 December, 2021; originally announced December 2021.

Comments: 25 pages, 4 figures, 9 tables

arXiv:2112.14333 [pdf]

Adoption of Industry 4.0 technologies in airports -- A systematic literature review

Authors: Jia Hao Tan, Tariq Masood

Abstract: Airports have been constantly evolving and adopting digital technologies to improve operational efficiency, enhance passenger experience, generate ancillary revenues and boost capacity from existing infrastructure. The COVID-19 pandemic has also challenged airports and aviation stakeholders alike to adapt and manage new operational challenges such as facilitating a contactless travel experience an… ▽ More Airports have been constantly evolving and adopting digital technologies to improve operational efficiency, enhance passenger experience, generate ancillary revenues and boost capacity from existing infrastructure. The COVID-19 pandemic has also challenged airports and aviation stakeholders alike to adapt and manage new operational challenges such as facilitating a contactless travel experience and ensuring business continuity. Digitalisation using Industry 4.0 technologies offers opportunities for airports to address short-term challenges associated with the COVID-19 pandemic while also preparing for future long-term challenges that ensue the crisis. Through a systematic literature review of 102 relevant articles, we discuss the current state of adoption of Industry 4.0 technologies in airports, the associated challenges as well as future research directions. The results of this review suggest that the implementation of Industry 4.0 technologies is slowly gaining traction within the airport environment, and shall continue to remain relevant in the digital transformation journeys in developing future airports. △ Less

Submitted 28 December, 2021; originally announced December 2021.

Comments: 25 pages, 2 figures, 2 tables, 106 references

arXiv:2112.03665 [pdf, ps, other]

Data-Driven Controllability Analysis and Stabilization for Linear Descriptor Systems

Authors: Jiabao He, Xuan Zhang, Feng Xu, Junbo Tan, Xueqian Wang

Abstract: For a parameter-unknown linear descriptor system, this paper proposes data-driven methods to testify the system's type and controllability and then to stabilize it. First, a data-based condition is developed to identify whether this unknown system is a descriptor system or is equivalent to a normal system. Furthermore, various controllability concepts are testified by replacing the descriptor syst… ▽ More For a parameter-unknown linear descriptor system, this paper proposes data-driven methods to testify the system's type and controllability and then to stabilize it. First, a data-based condition is developed to identify whether this unknown system is a descriptor system or is equivalent to a normal system. Furthermore, various controllability concepts are testified by replacing the descriptor system's matrices with data. Finally, a data-based decomposing method is proposed to transfer the nominal system into its slow-fast subsystems' form, so that a state feedback controller for the slow subsystem can be obtained from persistently exciting input and state sequences. Meanwhile, due to the equivalent stabilizability between the nominal system and its slow subsystem, a state feedback controller which stabilizes the nominal system is also obtained. A simulation example is provided to illustrate the effectiveness of those methods. △ Less

Submitted 30 December, 2021; v1 submitted 7 December, 2021; originally announced December 2021.

arXiv:2111.14352 [pdf, other]

Physics-informed Evolutionary Strategy based Control for Mitigating Delayed Voltage Recovery

Authors: Yan Du, Qiuhua Huang, Renke Huang, Tianzhixi Yin, Jie Tan, Wenhao Yu, Xinya Li

Abstract: In this work we propose a novel data-driven, real-time power system voltage control method based on the physics-informed guided meta evolutionary strategy (ES). The main objective is to quickly provide an adaptive control strategy to mitigate the fault-induced delayed voltage recovery (FIDVR) problem. Reinforcement learning methods have been developed for the same or similar challenging control pr… ▽ More In this work we propose a novel data-driven, real-time power system voltage control method based on the physics-informed guided meta evolutionary strategy (ES). The main objective is to quickly provide an adaptive control strategy to mitigate the fault-induced delayed voltage recovery (FIDVR) problem. Reinforcement learning methods have been developed for the same or similar challenging control problems, but they suffer from training inefficiency and lack of robustness for "corner or unseen" scenarios. On the other hand, extensive physical knowledge has been developed in power systems but little has been leveraged in learning-based approaches. To address these challenges, we introduce the trainable action mask technique for flexibly embedding physical knowledge into RL models to rule out unnecessary or unfavorable actions, and achieve notable improvements in sample efficiency, control performance and robustness. Furthermore, our method leverages past learning experience to derive surrogate gradient to guide and accelerate the exploration process in training. Case studies on the IEEE 300-bus system and comparisons with other state-of-the-art benchmark methods demonstrate effectiveness and advantages of our method. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.09083 [pdf, other]

Trajectory Prediction & Path Planning for an Object Intercepting UAV with a Mounted Depth Camera

Authors: Jasper Tan, Arijit Dasgupta, Arjun Agrawal, Sutthiphong Srigrarom

Abstract: A novel control & software architecture using ROS C++ is introduced for object interception by a UAV with a mounted depth camera and no external aid. Existing work in trajectory prediction focused on the use of off-board tools like motion capture rooms to intercept thrown objects. The present study designs the UAV architecture to be completely on-board capable of object interception with the use o… ▽ More A novel control & software architecture using ROS C++ is introduced for object interception by a UAV with a mounted depth camera and no external aid. Existing work in trajectory prediction focused on the use of off-board tools like motion capture rooms to intercept thrown objects. The present study designs the UAV architecture to be completely on-board capable of object interception with the use of a depth camera and point cloud processing. The architecture uses an iterative trajectory prediction algorithm for non-propelled objects like a ping-pong ball. A variety of path planning approaches to object interception and their corresponding scenarios are discussed, evaluated & simulated in Gazebo. The successful simulations exemplify the potential of using the proposed architecture for the on-board autonomy of UAVs intercepting objects. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: Accepted at the 21st International Conference on Control, Automation and Systems 2021 (ICCAS 2021)

arXiv:2110.14064 [pdf, other]

How will electric vehicles affect traffic congestion and energy consumption: an integrated modelling approach

Authors: Artur Grigorev, Tuo Mao, Adam Berry, Joachim Tan, Loki Purushothaman, Adriana-Simona Mihaita

Abstract: This paper explores the impact of electric vehicles (EVs) on traffic congestion and energy consumption by proposing an integrated bi-level framework comprising of: a) a dynamic micro-scale traffic simulation suitable for modelling current and hypothetical traffic and charging demand scenarios and b) a queue model for capturing the impact of fast charging station use, informed by traffic flows, tra… ▽ More This paper explores the impact of electric vehicles (EVs) on traffic congestion and energy consumption by proposing an integrated bi-level framework comprising of: a) a dynamic micro-scale traffic simulation suitable for modelling current and hypothetical traffic and charging demand scenarios and b) a queue model for capturing the impact of fast charging station use, informed by traffic flows, travel distances, availability of charging infrastructure and estimated vehicle battery state of charge. To the best of our knowledge, this paper represents the first integrated analysis of potential traffic congestion and energy infrastructure impacts linked to EV uptake, based on real traffic flows and the placement and design of existing fast-charging infrastructure. Results showcase that the integrated queue-energy-transport modelling framework can predict correctly the limitations of the EV infrastructure as well as the traffic congestion evolution. The modelling approach identifies concrete pain points to be addressed in both traffic and energy management and planning. The code for this project can be found at : https://github.com/Future-Mobility-Lab/EV-charging-impact △ Less

Submitted 26 October, 2021; originally announced October 2021.

arXiv:2109.14964 [pdf, ps, other]

doi 10.1109/TVT.2023.3236179

Capacity Enhancement for Reconfigurable Intelligent Surface-Aided Wireless Network: from Regular Array to Irregular Array

Authors: Ruochen Su, Linglong Dai, Jingbo Tan, Mo Hao, Richard MacKenzie

Abstract: Reconfigurable intelligent surface (RIS) is promising for future 6G wireless communications. However, the increased number of RIS elements results in the high overhead for channel acquisition and the non-negligible power consumption. Therefore, how to improve the system capacity with limited RIS elements is essential. Unlike the classical regular RIS whose elements are arranged on a regular grid,… ▽ More Reconfigurable intelligent surface (RIS) is promising for future 6G wireless communications. However, the increased number of RIS elements results in the high overhead for channel acquisition and the non-negligible power consumption. Therefore, how to improve the system capacity with limited RIS elements is essential. Unlike the classical regular RIS whose elements are arranged on a regular grid, in this paper, we propose an irregular RIS structure to improve the system capacity. The key idea is to irregularly configure a given number of RIS elements on an enlarged surface, which provides extra spatial degrees of freedom compared with the regular RIS. In this way, the received signal power can be enhanced, and thus the system capacity can be improved. Then, we formulate a joint topology and precoding optimization problem to maximize the capacity for irregular RIS-aided communication systems. Accordingly, a joint optimization algorithm with low complexity is proposed to alternately optimize the RIS topology and the precoding design. Particularly, a tabu search-based method is used to design the irregular RIS topology, and a neighbor extraction-based cross-entropy method is introduced to optimize the precoding design. Simulation results demonstrate that, subject to the constraint of limited RIS elements, the proposed irregular RIS can significantly enhance the system capacity. △ Less

Submitted 13 January, 2023; v1 submitted 30 September, 2021; originally announced September 2021.

Comments: Accepted by IEEE Transactions on Vehicular Technology. Simulation codes are provided at: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

arXiv:2108.08173 [pdf, other]

Wideband Channel Estimation for THz Massive MIMO

Authors: Jingbo Tan, Linglong Dai

Abstract: Terahertz (THz) communication is considered to be a promising technology for future 6G network. To overcome the severe attenuation and relieve the high power consumption, massive MIMO with hybrid precoding has been widely considered for THz communication. However, accurate wideband channel estimation is challenging in THz massive MIMO systems. The existing wideband channel estimation schemes based… ▽ More Terahertz (THz) communication is considered to be a promising technology for future 6G network. To overcome the severe attenuation and relieve the high power consumption, massive MIMO with hybrid precoding has been widely considered for THz communication. However, accurate wideband channel estimation is challenging in THz massive MIMO systems. The existing wideband channel estimation schemes based on the ideal assumption of common sparse channel support will suffer from a severe performance loss due to the beam split effect. In this paper, we propose a beam split pattern detection based channel estimation scheme to realize reliable wideband channel estimation. Specifically, a comprehensive analysis on the angle-domain sparse structure of the wideband channel is provided by considering the beam split effect. Based on the analysis, we define a series of index sets called as beam split patterns, which are proved to have a one-to-one match to different physical channel directions. Inspired by this one-to-one match, we propose to estimate the physical channel direction by exploiting beam split patterns at first. Then, the sparse channel supports at different subcarriers can be obtained by utilizing a support detection window. This support detection window is generated by expanding the beam split pattern which is determined by the obtained physical channel direction. The above estimation procedure will be repeated path by path until all path components are estimated. The proposed scheme exploits the wideband channel property implied by the beam split effect, which can significantly improve the channel estimation accuracy. Simulation results show that the proposed scheme is able to achieve higher accuracy than existing schemes. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: This paper has been accepted by China Communications. Simulation codes are provided to reproduce the results in this paper: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

arXiv:2107.13465 [pdf]

A Proof-of-Concept Study of Artificial Intelligence Assisted Contour Revision

Authors: Ti Bai, Anjali Balagopal, Michael Dohopolski, Howard E. Morgan, Rafe McBeth, Jun Tan, Mu-Han Lin, David J. Sher, Dan Nguyen, Steve Jiang

Abstract: Automatic segmentation of anatomical structures is critical for many medical applications. However, the results are not always clinically acceptable and require tedious manual revision. Here, we present a novel concept called artificial intelligence assisted contour revision (AIACR) and demonstrate its feasibility. The proposed clinical workflow of AIACR is as follows given an initial contour that… ▽ More Automatic segmentation of anatomical structures is critical for many medical applications. However, the results are not always clinically acceptable and require tedious manual revision. Here, we present a novel concept called artificial intelligence assisted contour revision (AIACR) and demonstrate its feasibility. The proposed clinical workflow of AIACR is as follows given an initial contour that requires a clinicians revision, the clinician indicates where a large revision is needed, and a trained deep learning (DL) model takes this input to update the contour. This process repeats until a clinically acceptable contour is achieved. The DL model is designed to minimize the clinicians input at each iteration and to minimize the number of iterations needed to reach acceptance. In this proof-of-concept study, we demonstrated the concept on 2D axial images of three head-and-neck cancer datasets, with the clinicians input at each iteration being one mouse click on the desired location of the contour segment. The performance of the model is quantified with Dice Similarity Coefficient (DSC) and 95th percentile of Hausdorff Distance (HD95). The average DSC/HD95 (mm) of the auto-generated initial contours were 0.82/4.3, 0.73/5.6 and 0.67/11.4 for three datasets, which were improved to 0.91/2.1, 0.86/2.4 and 0.86/4.7 with three mouse clicks, respectively. Each DL-based contour update requires around 20 ms. We proposed a novel AIACR concept that uses DL models to assist clinicians in revising contours in an efficient and effective way, and we demonstrated its feasibility by using 2D axial CT images from three head-and-neck cancer datasets. △ Less

Submitted 28 July, 2021; originally announced July 2021.

arXiv:2107.02643 [pdf, other]

Detecting Hypo-plastic Left Heart Syndrome in Fetal Ultrasound via Disease-specific Atlas Maps

Authors: Samuel Budd, Matthew Sinclair, Thomas Day, Athanasios Vlontzos, Jeremy Tan, Tianrui Liu, Jaqueline Matthew, Emily Skelton, John Simpson, Reza Razavi, Ben Glocker, Daniel Rueckert, Emma C. Robinson, Bernhard Kainz

Abstract: Fetal ultrasound screening during pregnancy plays a vital role in the early detection of fetal malformations which have potential long-term health impacts. The level of skill required to diagnose such malformations from live ultrasound during examination is high and resources for screening are often limited. We present an interpretable, atlas-learning segmentation method for automatic diagnosis of… ▽ More Fetal ultrasound screening during pregnancy plays a vital role in the early detection of fetal malformations which have potential long-term health impacts. The level of skill required to diagnose such malformations from live ultrasound during examination is high and resources for screening are often limited. We present an interpretable, atlas-learning segmentation method for automatic diagnosis of Hypo-plastic Left Heart Syndrome (HLHS) from a single `4 Chamber Heart' view image. We propose to extend the recently introduced Image-and-Spatial Transformer Networks (Atlas-ISTN) into a framework that enables sensitising atlas generation to disease. In this framework we can jointly learn image segmentation, registration, atlas construction and disease prediction while providing a maximum level of clinical interpretability compared to direct image classification methods. As a result our segmentation allows diagnoses competitive with expert-derived manual diagnosis and yields an AUC-ROC of 0.978 (1043 cases for training, 260 for validation and 325 for testing). △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: MICCAI'21 Main Conference

arXiv:2102.09616 [pdf]

Deep learning-based COVID-19 pneumonia classification using chest CT images: model generalizability

Authors: Dan Nguyen, Fernando Kay, Jun Tan, Yulong Yan, Yee Seng Ng, Puneeth Iyengar, Ron Peshock, Steve Jiang

Abstract: Since the outbreak of the COVID-19 pandemic, worldwide research efforts have focused on using artificial intelligence (AI) technologies on various medical data of COVID-19-positive patients in order to identify or classify various aspects of the disease, with promising reported results. However, concerns have been raised over their generalizability, given the heterogeneous factors in training data… ▽ More Since the outbreak of the COVID-19 pandemic, worldwide research efforts have focused on using artificial intelligence (AI) technologies on various medical data of COVID-19-positive patients in order to identify or classify various aspects of the disease, with promising reported results. However, concerns have been raised over their generalizability, given the heterogeneous factors in training datasets. This study aims to examine the severity of this problem by evaluating deep learning (DL) classification models trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries. We collected one dataset at UT Southwestern (UTSW), and three external datasets from different countries: CC-CCII Dataset (China), COVID-CTset (Iran), and MosMedData (Russia). We divided the data into 2 classes: COVID-19-positive and COVID-19-negative patients. We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split. The models trained on a single dataset achieved accuracy/area under the receiver operating characteristics curve (AUC) values of 0.87/0.826 (UTSW), 0.97/0.988 (CC-CCCI), and 0.86/0.873 (COVID-CTset) when evaluated on their own dataset. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better. However, the performance dropped close to an AUC of 0.5 (random guess) for all models when evaluated on a different dataset outside of its training datasets. Including the MosMedData, which only contained positive labels, into the training did not necessarily help the performance on the other datasets. Multiple factors likely contribute to these results, such as patient demographics and differences in image acquisition or reconstruction, causing a data shift among different study cohorts. △ Less

Submitted 18 February, 2021; originally announced February 2021.

arXiv:2101.05317 [pdf, other]

Learning and Fast Adaptation for Grid Emergency Control via Deep Meta Reinforcement Learning

Authors: Renke Huang, Yujiao Chen, Tianzhixi Yin, Qiuhua Huang, Jie Tan, Wenhao Yu, Xinya Li, Ang Li, Yan Du

Abstract: As power systems are undergoing a significant transformation with more uncertainties, less inertia and closer to operation limits, there is increasing risk of large outages. Thus, there is an imperative need to enhance grid emergency control to maintain system reliability and security. Towards this end, great progress has been made in developing deep reinforcement learning (DRL) based grid control… ▽ More As power systems are undergoing a significant transformation with more uncertainties, less inertia and closer to operation limits, there is increasing risk of large outages. Thus, there is an imperative need to enhance grid emergency control to maintain system reliability and security. Towards this end, great progress has been made in developing deep reinforcement learning (DRL) based grid control solutions in recent years. However, existing DRL-based solutions have two main limitations: 1) they cannot handle well with a wide range of grid operation conditions, system parameters, and contingencies; 2) they generally lack the ability to fast adapt to new grid operation conditions, system parameters, and contingencies, limiting their applicability for real-world applications. In this paper, we mitigate these limitations by developing a novel deep meta reinforcement learning (DMRL) algorithm. The DMRL combines the meta strategy optimization together with DRL, and trains policies modulated by a latent space that can quickly adapt to new scenarios. We test the developed DMRL algorithm on the IEEE 300-bus system. We demonstrate fast adaptation of the meta-trained DRL polices with latent variables to new operating conditions and scenarios using the proposed method and achieve superior performance compared to the state-of-the-art DRL and model predictive control (MPC) methods. △ Less

Submitted 5 February, 2022; v1 submitted 13 January, 2021; originally announced January 2021.

arXiv:2012.03679 [pdf, other]

Learning normal appearance for fetal anomaly screening: Application to the unsupervised detection of Hypoplastic Left Heart Syndrome

Authors: Elisa Chotzoglou, Thomas Day, Jeremy Tan, Jacqueline Matthew, David Lloyd, Reza Razavi, John Simpson, Bernhard Kainz

Abstract: Congenital heart disease is considered as one the most common groups of congenital malformations which affects $6-11$ per $1000$ newborns. In this work, an automated framework for detection of cardiac anomalies during ultrasound screening is proposed and evaluated on the example of Hypoplastic Left Heart Syndrome (HLHS), a sub-category of congenital heart disease. We propose an unsupervised approa… ▽ More Congenital heart disease is considered as one the most common groups of congenital malformations which affects $6-11$ per $1000$ newborns. In this work, an automated framework for detection of cardiac anomalies during ultrasound screening is proposed and evaluated on the example of Hypoplastic Left Heart Syndrome (HLHS), a sub-category of congenital heart disease. We propose an unsupervised approach that learns healthy anatomy exclusively from clinically confirmed normal control patients. We evaluate a number of known anomaly detection frameworks together with a model architecture based on the $α$-GAN network and find evidence that the proposed model performs significantly better than the state-of-the-art in image-based anomaly detection, yielding average $0.81$ AUC \emph{and} a better robustness towards initialisation compared to previous works. △ Less

Submitted 9 September, 2021; v1 submitted 15 November, 2020; originally announced December 2020.

arXiv:2011.09664 [pdf, other]

Safe Reinforcement Learning for Emergency LoadShedding of Power Systems

Authors: Thanh Long Vu, Sayak Mukherjee, Tim Yin, Renke Huang, and Jie Tan, Qiuhua Huang

Abstract: The paradigm shift in the electric power grid necessitates a revisit of existing control methods to ensure the grid's security and resilience. In particular, the increased uncertainties and rapidly changing operational conditions in power systems have revealed outstanding issues in terms of either speed, adaptiveness, or scalability of the existing control methods for power systems. On the other h… ▽ More The paradigm shift in the electric power grid necessitates a revisit of existing control methods to ensure the grid's security and resilience. In particular, the increased uncertainties and rapidly changing operational conditions in power systems have revealed outstanding issues in terms of either speed, adaptiveness, or scalability of the existing control methods for power systems. On the other hand, the availability of massive real-time data can provide a clearer picture of what is happening in the grid. Recently, deep reinforcement learning(RL) has been regarded and adopted as a promising approach leveraging massive data for fast and adaptive grid control. However, like most existing machine learning (ML)-basedcontrol techniques, RL control usually cannot guarantee the safety of the systems under control. In this paper, we introduce a novel method for safe RL-based load shedding of power systems that can enhance the safe voltage recovery of the electric power grid after experiencing faults. Numerical simulations on the 39-bus IEEE benchmark is performed to demonstrate the effectiveness of the proposed safe RL emergency control, as well as its adaptive capability to faults not seen in the training. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: arXiv admin note: text overlap with arXiv:2006.12667

arXiv:2010.13975 [pdf, other]

Wearing a MASK: Compressed Representations of Variable-Length Sequences Using Recurrent Neural Tangent Kernels

Authors: Sina Alemohammad, Hossein Babaei, Randall Balestriero, Matt Y. Cheung, Ahmed Imtiaz Humayun, Daniel LeJeune, Naiming Liu, Lorenzo Luzi, Jasper Tan, Zichao Wang, Richard G. Baraniuk

Abstract: High dimensionality poses many challenges to the use of data, from visualization and interpretation, to prediction and storage for historical preservation. Techniques abound to reduce the dimensionality of fixed-length sequences, yet these methods rarely generalize to variable-length sequences. To address this gap, we extend existing methods that rely on the use of kernels to variable-length seque… ▽ More High dimensionality poses many challenges to the use of data, from visualization and interpretation, to prediction and storage for historical preservation. Techniques abound to reduce the dimensionality of fixed-length sequences, yet these methods rarely generalize to variable-length sequences. To address this gap, we extend existing methods that rely on the use of kernels to variable-length sequences via use of the Recurrent Neural Tangent Kernel (RNTK). Since a deep neural network with ReLu activation is a Max-Affine Spline Operator (MASO), we dub our approach Max-Affine Spline Kernel (MASK). We demonstrate how MASK can be used to extend principal components analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) and apply these new algorithms to separate synthetic time series data sampled from second-order differential equations. △ Less

Submitted 17 April, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

arXiv:2008.07358 [pdf, other]

SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification

Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

Abstract: Point clouds are often the default choice for many applications as they exhibit more flexibility and efficiency than volumetric data. Nevertheless, their unorganized nature -- points are stored in an unordered way -- makes them less suited to be processed by deep learning pipelines. In this paper, we propose a method for 3D object completion and classification based on point clouds. We introduce a… ▽ More Point clouds are often the default choice for many applications as they exhibit more flexibility and efficiency than volumetric data. Nevertheless, their unorganized nature -- points are stored in an unordered way -- makes them less suited to be processed by deep learning pipelines. In this paper, we propose a method for 3D object completion and classification based on point clouds. We introduce a new way of organizing the extracted features based on their activations, which we name soft pooling. For the decoder stage, we propose regional convolutions, a novel operator aimed at maximizing the global activation entropy. Furthermore, inspired by the local refining procedure in Point Completion Network (PCN), we also propose a patch-deforming operation to simulate deconvolutional operations for point clouds. This paper proves that our regional activation can be incorporated in many point cloud architectures like AtlasNet and PCN, leading to better performance for geometric completion. We evaluate our approach on different 3D tasks such as object completion and classification, achieving state-of-the-art accuracy. △ Less

Submitted 17 August, 2020; originally announced August 2020.

Comments: accepted in ECCV 2020 as oral

arXiv:2008.06966 [pdf, other]

Automated Detection of Congenital Heart Disease in Fetal Ultrasound Screening

Authors: Jeremy Tan, Anselm Au, Qingjie Meng, Sandy FinesilverSmith, John Simpson, Daniel Rueckert, Reza Razavi, Thomas Day, David Lloyd, Bernhard Kainz

Abstract: Prenatal screening with ultrasound can lower neonatal mortality significantly for selected cardiac abnormalities. However, the need for human expertise, coupled with the high volume of screening cases, limits the practically achievable detection rates. In this paper we discuss the potential for deep learning techniques to aid in the detection of congenital heart disease (CHD) in fetal ultrasound.… ▽ More Prenatal screening with ultrasound can lower neonatal mortality significantly for selected cardiac abnormalities. However, the need for human expertise, coupled with the high volume of screening cases, limits the practically achievable detection rates. In this paper we discuss the potential for deep learning techniques to aid in the detection of congenital heart disease (CHD) in fetal ultrasound. We propose a pipeline for automated data curation and classification. During both training and inference, we exploit an auxiliary view classification task to bias features toward relevant cardiac structures. This bias helps to improve in F1-scores from 0.72 and 0.77 to 0.87 and 0.85 for healthy and CHD classes respectively. △ Less

Submitted 17 August, 2020; v1 submitted 16 August, 2020; originally announced August 2020.

arXiv:2007.03260 [pdf, other]

ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting

Authors: Xiaohan Ding, Tianxiang Hao, Jianchao Tan, Ji Liu, Jungong Han, Yuchen Guo, Guiguang Ding

Abstract: We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain th… ▽ More We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn to prune. Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity. Then we equivalently merge the remembering and forgetting parts into the original architecture with narrower layers. In this sense, ResRep can be viewed as a successful application of Structural Re-parameterization. Such a methodology distinguishes ResRep from the traditional learning-based pruning paradigm that applies a penalty on parameters to produce sparsity, which may suppress the parameters essential for the remembering. ResRep slims down a standard ResNet-50 with 76.15% accuracy on ImageNet to a narrower one with only 45% FLOPs and no accuracy drop, which is the first to achieve lossless pruning with such a high compression ratio. The code and models are at https://github.com/DingXiaoH/ResRep. △ Less

Submitted 14 August, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: ICCV 2021

arXiv:2006.12667 [pdf, other]

Accelerated Deep Reinforcement Learning Based Load Shedding for Emergency Voltage Control

Authors: Renke Huang, Yujiao Chen, Tianzhixi Yin, Xinya Li, Ang Li, Jie Tan, Wenhao Yu, Yuan Liu, Qiuhua Huang

Abstract: Load shedding has been one of the most widely used and effective emergency control approaches against voltage instability. With increased uncertainties and rapidly changing operational conditions in power systems, existing methods have outstanding issues in terms of either speed, adaptiveness, or scalability. Deep reinforcement learning (DRL) was regarded and adopted as a promising approach for fa… ▽ More Load shedding has been one of the most widely used and effective emergency control approaches against voltage instability. With increased uncertainties and rapidly changing operational conditions in power systems, existing methods have outstanding issues in terms of either speed, adaptiveness, or scalability. Deep reinforcement learning (DRL) was regarded and adopted as a promising approach for fast and adaptive grid stability control in recent years. However, existing DRL algorithms show two outstanding issues when being applied to power system control problems: 1) computational inefficiency that requires extensive training and tuning time; and 2) poor scalability making it difficult to scale to high dimensional control problems. To overcome these issues, an accelerated DRL algorithm named PARS was developed and tailored for power system voltage stability control via load shedding. PARS features high scalability and is easy to tune with only five main hyperparameters. The method was tested on both the IEEE 39-bus and IEEE 300-bus systems, and the latter is by far the largest scale for such a study. Test results show that, compared to other methods including model-predictive control (MPC) and proximal policy optimization(PPO) methods, PARS shows better computational efficiency (faster convergence), more robustness in learning, excellent scalability and generalization capability. △ Less

Submitted 5 December, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

arXiv:2006.11636 [pdf, other]

Joint Demosaicking / Rectification of Fisheye Camera Images using Multi-color Graph Laplacian Regularization

Authors: Fengbo Lan, Cheng Yang, Gene Cheung, Jack Z. G. Tan

Abstract: To compose a 360 image from a rig with multiple fisheye cameras, a conventional processing pipeline first performs demosaicking on each fisheye camera's Bayer-patterned grid, then translates demosaicked pixels from the camera grid to a rectified image grid---thus performing two image interpolation steps in sequence. Hence interpolation errors can accumulate, and acquisition noise in the captured p… ▽ More To compose a 360 image from a rig with multiple fisheye cameras, a conventional processing pipeline first performs demosaicking on each fisheye camera's Bayer-patterned grid, then translates demosaicked pixels from the camera grid to a rectified image grid---thus performing two image interpolation steps in sequence. Hence interpolation errors can accumulate, and acquisition noise in the captured pixels can pollute neighbors in two consecutive processing stages. In this paper, we propose a joint processing framework that performs demosaicking and grid-to-grid mapping simultaneously---thus limiting noise pollution to one interpolation. Specifically, we first obtain a reverse mapping function from a regular on-grid location in the rectified image to an irregular off-grid location in the camera's Bayer-patterned image. For each pair of adjacent pixels in the rectified grid, we estimate its gradient using the pair's neighboring pixel gradients in three colors in the Bayer-patterned grid. We construct a similarity graph based on the estimated gradients, and interpolate pixels in the rectified grid directly via graph Laplacian regularization (GLR). Experiments show that our joint method outperforms several competing local methods that execute demosaicking and rectification in sequence, by up to 0.52 dB in PSNR and 0.086 in SSIM on the publicly available dataset, and by up to 5.53dB in PSNR and 0.411 in SSIM on the in-house constructed dataset. △ Less

Submitted 20 June, 2020; originally announced June 2020.

arXiv:2005.10752 [pdf, ps, other]

THz Precoding for 6G: Applications, Challenges, Solutions, and Opportunities

Authors: Jingbo Tan, Linglong Dai

Abstract: Benefiting from the ultra-wide bandwidth, terahertz (THz) communication is becoming a promising technology for future 6G networks. For THz communication, precoding is an essential technique to overcome the severe path loss of THz signals in order to support the desired coverage. In this article, we systematically investigate the dominant THz precoding techniques for future 6G networks, with the hi… ▽ More Benefiting from the ultra-wide bandwidth, terahertz (THz) communication is becoming a promising technology for future 6G networks. For THz communication, precoding is an essential technique to overcome the severe path loss of THz signals in order to support the desired coverage. In this article, we systematically investigate the dominant THz precoding techniques for future 6G networks, with the highlight on its key challenges and opportunities. Specifically, we first illustrate three typical THz application scenarios including indoor, mobile, and satellite communications. Then, the major differences between millimeter-wave and THz channels are explicitly clarified, based on which we reveal the key challenges of THz precoding, such as the distance-dependent path loss, the beam split effect, and the high power consumption. To address these challenges, three representative THz precoding techniques, i.e., analog beamforming, hybrid precoding, and delay-phase precoding, are extensively investigated in terms of their different structures, designs, most recent results, pros and cons. We also provide simulation results of spectrum and energy efficiencies to compare these typical THz precoding schemes to draw some insights for their applications in future 6G networks. Finally, several important open issues and the potential research opportunities, such as the use of reconfigurable intelligent surface (RIS) to solve the THz blockage problem, are pointed out and discussed. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: 7 pages, 5 figures

Showing 1–50 of 60 results for author: Tan, J