Search | arXiv e-print repository

Towards a Quantitative Analysis of Coarticulation with a Phoneme-to-Articulatory Model

Authors: Chaofei Fan, Jaimie M. Henderson, Chris Manning, Francis R. Willett

Abstract: Prior coarticulation studies focus mainly on limited phonemic sequences and specific articulators, providing only approximate descriptions of the temporal extent and magnitude of coarticulation. This paper is an initial attempt to comprehensively investigate coarticulation. We leverage existing Electromagnetic Articulography (EMA) datasets to develop and train a phoneme-to-articulatory (P2A) model… ▽ More Prior coarticulation studies focus mainly on limited phonemic sequences and specific articulators, providing only approximate descriptions of the temporal extent and magnitude of coarticulation. This paper is an initial attempt to comprehensively investigate coarticulation. We leverage existing Electromagnetic Articulography (EMA) datasets to develop and train a phoneme-to-articulatory (P2A) model that can generate realistic EMA for novel phoneme sequences and replicate known coarticulation patterns. We use model-generated EMA on 9K minimal word pairs to analyze coarticulation magnitude and extent up to eight phonemes from the coarticulation trigger, and compare coarticulation resistance across different consonants. Our findings align with earlier studies and suggest a longer-range coarticulation effect than previously found. This model-based approach can potentially compare coarticulation between adults and children and across languages, offering new insights into speech production. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: To be published in Interspeech 2024

arXiv:2407.13895 [pdf, other]

Improving Robustness and Clinical Applicability of Respiratory Sound Classification via Audio Enhancement

Authors: Jing-Tong Tzeng, Jeng-Lin Li, Huan-Yu Chen, Chun-Hsiang Huang, Chi-Hsin Chen, Cheng-Yi Fan, Edward Pei-Chuan Huang, Chi-Chun Lee

Abstract: Deep learning techniques have shown promising results in the automatic classification of respiratory sounds. However, accurately distinguishing these sounds in real-world noisy conditions poses challenges for clinical deployment. Additionally, predicting signals with only background noise could undermine user trust in the system. In this study, we propose an audio enhancement (AE) pipeline as a pr… ▽ More Deep learning techniques have shown promising results in the automatic classification of respiratory sounds. However, accurately distinguishing these sounds in real-world noisy conditions poses challenges for clinical deployment. Additionally, predicting signals with only background noise could undermine user trust in the system. In this study, we propose an audio enhancement (AE) pipeline as a pre-processing step before respiratory sound classification, aiming to improve performance in noisy environments. Multiple experiments were conducted using different audio enhancement model structures, demonstrating improved classification performance compared to the baseline method of noise injection data augmentation. Specifically, the integration of the AE pipeline resulted in a 2.59% increase in the ICBHI classification score on the ICBHI respiratory sound dataset and a 2.51% improvement on our recently collected Formosa Archive of Breath Sounds (FABS) in multi-class noisy scenarios. Furthermore, a physician validation study assessed the clinical utility of our system. Quantitative analysis revealed enhancements in efficiency, diagnostic confidence, and trust during model-assisted diagnosis with our system compared to raw noisy recordings. Workflows integrating enhanced audio led to an 11.61% increase in diagnostic sensitivity and facilitated high-confidence diagnoses. Our findings demonstrate that incorporating an audio enhancement algorithm significantly enhances robustness and clinical utility. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: The following article has been submitted to The Journal of the Acoustical Society of America (JASA). After it is published, it will be found at https://pubs.aip.org/asa/jasa

arXiv:2407.08481 [pdf, other]

SliceMamba with Neural Architecture Search for Medical Image Segmentation

Authors: Chao Fan, Hongyuan Yu, Yan Huang, Liang Wang, Zhenghan Yang, Xibin Jia

Abstract: Despite the progress made in Mamba-based medical image segmentation models, existing methods utilizing unidirectional or multi-directional feature scanning mechanisms struggle to effectively capture dependencies between neighboring positions, limiting the discriminant representation learning of local features. These local features are crucial for medical image segmentation as they provide critical… ▽ More Despite the progress made in Mamba-based medical image segmentation models, existing methods utilizing unidirectional or multi-directional feature scanning mechanisms struggle to effectively capture dependencies between neighboring positions, limiting the discriminant representation learning of local features. These local features are crucial for medical image segmentation as they provide critical structural information about lesions and organs. To address this limitation, we propose SliceMamba, a simple and effective locally sensitive Mamba-based medical image segmentation model. SliceMamba includes an efficient Bidirectional Slice Scan module (BSS), which performs bidirectional feature slicing and employs varied scanning mechanisms for sliced features with distinct shapes. This design ensures that spatially adjacent features remain close in the scanning sequence, thereby improving segmentation performance. Additionally, to fit the varying sizes and shapes of lesions and organs, we further introduce an Adaptive Slice Search method to automatically determine the optimal feature slice method based on the characteristics of the target data. Extensive experiments on two skin lesion datasets (ISIC2017 and ISIC2018), two polyp segmentation (Kvasir and ClinicDB) datasets, and one multi-organ segmentation dataset (Synapse) validate the effectiveness of our method. △ Less

Submitted 19 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2407.05726 [pdf, other]

Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis

Authors: Zirui Zhou, Junhao Liang, Zizhao Peng, Chao Fan, Fengwei An, Shiqi Yu

Abstract: Scoliosis presents significant diagnostic challenges, particularly in adolescents, where early detection is crucial for effective treatment. Traditional diagnostic and follow-up methods, which rely on physical examinations and radiography, face limitations due to the need for clinical expertise and the risk of radiation exposure, thus restricting their use for widespread early screening. In respon… ▽ More Scoliosis presents significant diagnostic challenges, particularly in adolescents, where early detection is crucial for effective treatment. Traditional diagnostic and follow-up methods, which rely on physical examinations and radiography, face limitations due to the need for clinical expertise and the risk of radiation exposure, thus restricting their use for widespread early screening. In response, we introduce a novel video-based, non-invasive method for scoliosis classification using gait analysis, effectively circumventing these limitations. This study presents Scoliosis1K, the first large-scale dataset specifically designed for video-based scoliosis classification, encompassing over one thousand adolescents. Leveraging this dataset, we developed ScoNet, an initial model that faced challenges in handling the complexities of real-world data. This led to the development of ScoNet-MT, an enhanced model incorporating multi-task learning, which demonstrates promising diagnostic accuracy for practical applications. Our findings demonstrate that gait can serve as a non-invasive biomarker for scoliosis, revolutionizing screening practices through deep learning and setting a precedent for non-invasive diagnostic methodologies. The dataset and code are publicly available at https://zhouzi180.github.io/Scoliosis1K/. △ Less

Submitted 23 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted to MICCAI 2024

arXiv:2406.09664 [pdf, other]

Frequency-mix Knowledge Distillation for Fake Speech Detection

Authors: Cunhang Fan, Shunbo Dong, Jun Xue, Yujie Chen, Jiangyan Yi, Zhao Lv

Abstract: In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA… ▽ More In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA method, Frequency-mix (Freqmix), and introduce the Freqmix knowledge distillation (FKD) to enhance model information extraction and generalization abilities. Specifically, we use Freqmix-enhanced data as input for the teacher model, while the student model's input undergoes time-domain DA method. We use a multi-level feature distillation approach to restore information and improve the model's generalization capabilities. Our approach achieves state-of-the-art results on ASVspoof 2021 LA dataset, showing a 31\% improvement over baseline and performs competitively on ASVspoof 2021 DF dataset. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.06086 [pdf, other]

RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

Authors: Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan

Abstract: Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf… ▽ More Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepfake detection. Specifically, we use sinc Layer and multiple convolutional layers to capture short-range features, and then design a bidirectional Mamba to address Mamba's unidirectional modelling problem and further capture long-range feature information. Moreover, we develop a bidirectional fusion module to integrate embeddings, enhancing audio context representation and combining short- and long-range information. The results show that our proposed RawBMamba achieves a 34.1\% improvement over Rawformer on ASVspoof2021 LA dataset, and demonstrates competitive performance on other datasets. △ Less

Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2405.18554 [pdf, other]

Scalable Surrogate Verification of Image-based Neural Network Control Systems using Composition and Unrolling

Authors: Feiyang Cai, Chuchu Fan, Stanley Bak

Abstract: Verifying safety of neural network control systems that use images as input is a difficult problem because, from a given system state, there is no known way to mathematically model what images are possible in the real-world. We build on recent work that considers a surrogate verification approach, training a conditional generative adversarial network (cGAN) as an image generator in place of the re… ▽ More Verifying safety of neural network control systems that use images as input is a difficult problem because, from a given system state, there is no known way to mathematically model what images are possible in the real-world. We build on recent work that considers a surrogate verification approach, training a conditional generative adversarial network (cGAN) as an image generator in place of the real world. This enables set-based formal analysis of the closed-loop system, providing analysis beyond simulation and testing. While existing work is effective on small examples, excessive overapproximation both within a single control period and across multiple control periods limits its scalability. We propose approaches to overcome these two sources of error. First, we overcome one-step error by composing the system's dynamics along with the cGAN and neural network controller, without losing the dependencies between input states and the control outputs as in the monotonic analysis of the system dynamics. Second, we reduce multi-step error by repeating the single-step composition, essentially unrolling multiple steps of the control loop into a large neural network. We then leverage existing network verification tools to compute accurate reachable sets for multiple steps, avoiding the accumulation of abstraction error at each step. We demonstrate the effectiveness of our approach in terms of both accuracy and scalability using two case studies: an autonomous aircraft taxiing system and an advanced emergency braking system. On the aircraft taxiing system, the converged reachable set is 175% larger using the prior baseline method compared with our proposed approach. On the emergency braking system, with 24x the number of image output variables from the cGAN, the baseline method fails to prove any states are safe, whereas our improvements enable set-based safety analysis. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2311.13714 [pdf, other]

Learning Safe Control for Multi-Robot Systems: Methods, Verification, and Open Challenges

Authors: Kunal Garg, Songyuan Zhang, Oswin So, Charles Dawson, Chuchu Fan

Abstract: In this survey, we review the recent advances in control design methods for robotic multi-agent systems (MAS), focussing on learning-based methods with safety considerations. We start by reviewing various notions of safety and liveness properties, and modeling frameworks used for problem formulation of MAS. Then we provide a comprehensive review of learning-based methods for safe control design fo… ▽ More In this survey, we review the recent advances in control design methods for robotic multi-agent systems (MAS), focussing on learning-based methods with safety considerations. We start by reviewing various notions of safety and liveness properties, and modeling frameworks used for problem formulation of MAS. Then we provide a comprehensive review of learning-based methods for safe control design for multi-robot systems. We start with various types of shielding-based methods, such as safety certificates, predictive filters, and reachability tools. Then, we review the current state of control barrier certificate learning in both a centralized and distributed manner, followed by a comprehensive review of multi-agent reinforcement learning with a particular focus on safety. Next, we discuss the state-of-the-art verification tools for the correctness of learning-based methods. Based on the capabilities and the limitations of the state of the art methods in learning and verification for MAS, we identify various broad themes for open challenges: how to design methods that can achieve good performance along with safety guarantees; how to decompose single-agent based centralized methods for MAS; how to account for communication-related practical issues; and how to assess transfer of theoretical guarantees to practice. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Submitted to Annual Reviews in Control

arXiv:2311.13014 [pdf, other]

Neural Graph Control Barrier Functions Guided Distributed Collision-avoidance Multi-agent Control

Authors: Songyuan Zhang, Kunal Garg, Chuchu Fan

Abstract: We consider the problem of designing distributed collision-avoidance multi-agent control in large-scale environments with potentially moving obstacles, where a large number of agents are required to maintain safety using only local information and reach their goals. This paper addresses the problem of collision avoidance, scalability, and generalizability by introducing graph control barrier funct… ▽ More We consider the problem of designing distributed collision-avoidance multi-agent control in large-scale environments with potentially moving obstacles, where a large number of agents are required to maintain safety using only local information and reach their goals. This paper addresses the problem of collision avoidance, scalability, and generalizability by introducing graph control barrier functions (GCBFs) for distributed control. The newly introduced GCBF is based on the well-established CBF theory for safety guarantees but utilizes a graph structure for scalable and generalizable decentralized control. We use graph neural networks to learn both neural a GCBF certificate and distributed control. We also extend the framework from handling state-based models to directly taking point clouds from LiDAR for more practical robotics settings. We demonstrated the efficacy of GCBF in a variety of numerical experiments, where the number, density, and traveling distance of agents, as well as the number of unseen and uncontrolled obstacles increase. Empirical results show that GCBF outperforms leading methods such as MAPPO and multi-agent distributed CBF (MDCBF). Trained with only 16 agents, GCBF can achieve up to 3 times improvement of success rate (agents reach goals and never encountered in any collisions) on <500 agents, and still maintain more than 50% success rates for >1000 agents when other methods completely fail. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 20 pages, 10 figures; Accepted by 7th Conference on Robot Learning (CoRL 2023)

arXiv:2310.08869 [pdf, other]

doi 10.1109/TASLP.2024.3389643

Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection

Authors: Cunhang Fan, Mingming Ding, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Zhao Lv

Abstract: Most research in synthetic speech detection (SSD) focuses on improving performance on standard noise-free datasets. However, in actual situations, noise interference is usually present, causing significant performance degradation in SSD systems. To improve noise robustness, this paper proposes a dual-branch knowledge distillation synthetic speech detection (DKDSSD) method. Specifically, a parallel… ▽ More Most research in synthetic speech detection (SSD) focuses on improving performance on standard noise-free datasets. However, in actual situations, noise interference is usually present, causing significant performance degradation in SSD systems. To improve noise robustness, this paper proposes a dual-branch knowledge distillation synthetic speech detection (DKDSSD) method. Specifically, a parallel data flow of the clean teacher branch and the noisy student branch is designed, and interactive fusion module and response-based teacher-student paradigms are proposed to guide the training of noisy data from both the data distribution and decision-making perspectives. In the noisy student branch, speech enhancement is introduced initially for denoising, aiming to reduce the interference of strong noise. The proposed interactive fusion combines denoised features and noisy features to mitigate the impact of speech distortion and ensure consistency with the data distribution of the clean branch. The teacher-student paradigm maps the student's decision space to the teacher's decision space, enabling noisy speech to behave similarly to clean speech. Additionally, a joint training method is employed to optimize both branches for achieving global optimality. Experimental results based on multiple datasets demonstrate that the proposed method performs effectively in noisy environments and maintains its performance in cross-dataset experiments. Source code is available at https://github.com/fchest/DKDSSD. △ Less

Submitted 16 April, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.06956 [pdf, other]

Adversarial optimization leads to over-optimistic security-constrained dispatch, but sampling can help

Authors: Charles Dawson, Chuchu Fan

Abstract: To ensure safe, reliable operation of the electrical grid, we must be able to predict and mitigate likely failures. This need motivates the classic security-constrained AC optimal power flow (SCOPF) problem. SCOPF is commonly solved using adversarial optimization, where the dispatcher and an adversary take turns optimizing a robust dispatch and adversarial attack, respectively. We show that advers… ▽ More To ensure safe, reliable operation of the electrical grid, we must be able to predict and mitigate likely failures. This need motivates the classic security-constrained AC optimal power flow (SCOPF) problem. SCOPF is commonly solved using adversarial optimization, where the dispatcher and an adversary take turns optimizing a robust dispatch and adversarial attack, respectively. We show that adversarial optimization is liable to severely overestimate the robustness of the optimized dispatch (when the adversary encounters a local minimum), leading the operator to falsely believe that their dispatch is secure. To prevent this overconfidence, we develop a novel adversarial sampling approach that prioritizes diversity in the predicted attacks. We find that our method not only substantially improves the robustness of the optimized dispatch but also avoids overconfidence, accurately characterizing the likelihood of voltage collapse under a given threat model. We demonstrate a proof-of-concept on small-scale transmission systems with 14 and 57 nodes. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted at NAPS 2023

arXiv:2309.09108 [pdf, other]

Neural Network-based Fault Detection and Identification for Quadrotors using Dynamic Symmetry

Authors: Kunal Garg, Chuchu Fan

Abstract: Autonomous robotic systems, such as quadrotors, are susceptible to actuator faults, and for the safe operation of such systems, timely detection and isolation of these faults is essential. Neural networks can be used for verification of actuator performance via online actuator fault detection with high accuracy. In this paper, we develop a novel model-free fault detection and isolation (FDI) frame… ▽ More Autonomous robotic systems, such as quadrotors, are susceptible to actuator faults, and for the safe operation of such systems, timely detection and isolation of these faults is essential. Neural networks can be used for verification of actuator performance via online actuator fault detection with high accuracy. In this paper, we develop a novel model-free fault detection and isolation (FDI) framework for quadrotor systems using long-short-term memory (LSTM) neural network architecture. The proposed framework only uses system output data and the commanded control input and requires no knowledge of the system model. Utilizing the symmetry in quadrotor dynamics, we train the FDI for fault in just one of the motors (e.g., motor $\# 2$), and the trained FDI can predict faults in any of the motors. This reduction in search space enables us to design an FDI for partial fault as well as complete fault scenarios. Numerical experiments illustrate that the proposed NN-FDI correctly verifies the actuator performance and identifies partial as well as complete faults with over $90\%$ prediction accuracy. We also illustrate that model-free NN-FDI performs at par with model-based FDI, and is robust to model uncertainties as well as distribution shifts in input data. △ Less

Submitted 16 September, 2023; originally announced September 2023.

Comments: Accepted for 2023 Allerton Conference on Communication, Control, & Computing

arXiv:2309.08052 [pdf, other]

A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling

Authors: Charles Dawson, Chuchu Fan

Abstract: Before autonomous systems can be deployed in safety-critical applications, we must be able to understand and verify the safety of these systems. For cases where the risk or cost of real-world testing is prohibitive, we propose a simulation-based framework for a) predicting ways in which an autonomous system is likely to fail and b) automatically adjusting the system's design to preemptively mitiga… ▽ More Before autonomous systems can be deployed in safety-critical applications, we must be able to understand and verify the safety of these systems. For cases where the risk or cost of real-world testing is prohibitive, we propose a simulation-based framework for a) predicting ways in which an autonomous system is likely to fail and b) automatically adjusting the system's design to preemptively mitigate those failures. We frame this problem through the lens of approximate Bayesian inference and use differentiable simulation for efficient failure case prediction and repair. We apply our approach on a range of robotics and control problems, including optimizing search patterns for robot swarms and reducing the severity of outages in power transmission networks. Compared to optimization-based falsification techniques, our method predicts a more diverse, representative set of failure modes, and we also find that our use of differentiable simulation yields solutions that have up to 10x lower cost and requires up to 2x fewer iterations to converge relative to gradient-free techniques. Code and videos can be found at https://mit-realm.github.io/breaking-things/ △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: To appear at the 2023 Conference on Robot Learning (CoRL)

arXiv:2309.07147 [pdf, other]

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

Authors: Cunhang Fan, Hongyu Zhang, Wei Huang, Jun Xue, Jianhua Tao, Jiangyan Yi, Zhao Lv, Xiaopei Wu

Abstract: Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean… ▽ More Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2308.09944 [pdf, other]

doi 10.1016/j.neunet.2024.106320

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

Authors: Cunhang Fan, Jun Xue, Jianhua Tao, Jiangyan Yi, Chenglong Wang, Chengshi Zheng, Zhao Lv

Abstract: The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 sub… ▽ More The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47% and a minimum tandem detection cost function (min t-DCF) of 0.0159, achieving the state-of-the-art performance among all of the single systems. △ Less

Submitted 8 July, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

Comments: Accept by Neural Networks

arXiv:2306.15389 [pdf, other]

Multi-perspective Information Fusion Res2Net with RandomSpecmix for Fake Speech Detection

Authors: Shunbo Dong, Jun Xue, Cunhang Fan, Kang Zhu, Yujie Chen, Zhao Lv

Abstract: In this paper, we propose the multi-perspective information fusion (MPIF) Res2Net with random Specmix for fake speech detection (FSD). The main purpose of this system is to improve the model's ability to learn precise forgery information for FSD task in low-quality scenarios. The task of random Specmix, a data augmentation, is to improve the generalization ability of the model and enhance the mode… ▽ More In this paper, we propose the multi-perspective information fusion (MPIF) Res2Net with random Specmix for fake speech detection (FSD). The main purpose of this system is to improve the model's ability to learn precise forgery information for FSD task in low-quality scenarios. The task of random Specmix, a data augmentation, is to improve the generalization ability of the model and enhance the model's ability to locate discriminative information. Specmix cuts and pastes the frequency dimension information of the spectrogram in the same batch of samples without introducing other data, which helps the model to locate the really useful information. At the same time, we randomly select samples for augmentation to reduce the impact of data augmentation directly changing all the data. Once the purpose of helping the model to locate information is achieved, it is also important to reduce unnecessary information. The role of MPIF-Res2Net is to reduce redundant interference information. Deceptive information from a single perspective is always similar, so the model learning this similar information will produce redundant spoofing clues and interfere with truly discriminative information. The proposed MPIF-Res2Net fuses information from different perspectives, making the information learned by the model more diverse, thereby reducing the redundancy caused by similar information and avoiding interference with the learning of discriminative information. The results on the ASVspoof 2021 LA dataset demonstrate the effectiveness of our proposed method, achieving EER and min-tDCF of 3.29% and 0.2557, respectively. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Accepted by DADA2023

arXiv:2306.08722 [pdf, other]

Learning to Stabilize High-dimensional Unknown Systems Using Lyapunov-guided Exploration

Authors: Songyuan Zhang, Chuchu Fan

Abstract: Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that can hardly be accurately modeled with differential equations. The Lyapunov theory offers a solution for stabilizing control systems, still, current methods relying on Lyapunov functions require access to complete dynamics or samples of system executions thro… ▽ More Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that can hardly be accurately modeled with differential equations. The Lyapunov theory offers a solution for stabilizing control systems, still, current methods relying on Lyapunov functions require access to complete dynamics or samples of system executions throughout the entire state space. Consequently, they are impractical for high-dimensional systems. This paper introduces a novel framework, LYapunov-Guided Exploration (LYGE), for learning stabilizing controllers tailored to high-dimensional, unknown systems. LYGE employs Lyapunov theory to iteratively guide the search for samples during exploration while simultaneously learning the local system dynamics, control policy, and Lyapunov functions. We demonstrate its scalability on highly complex systems, including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input space. Experiments indicate that, compared to prior works in reinforcement learning, imitation learning, and neural certificates, LYGE reduces the distance to the goal by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate that our algorithm can be extended to learn controllers guided by other certificate functions for unknown systems. △ Less

Submitted 16 May, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 32 pages, 7 figures; Accepted by the 6th Annual Conference on Learning for Dynamics and Control (L4DC 2024)

arXiv:2303.14564 [pdf, other]

Compositional Neural Certificates for Networked Dynamical Systems

Authors: Songyuan Zhang, Yumeng Xiu, Guannan Qu, Chuchu Fan

Abstract: Developing stable controllers for large-scale networked dynamical systems is crucial but has long been challenging due to two key obstacles: certifiability and scalability. In this paper, we present a general framework to solve these challenges using compositional neural certificates based on ISS (Input-to-State Stability) Lyapunov functions. Specifically, we treat a large networked dynamical syst… ▽ More Developing stable controllers for large-scale networked dynamical systems is crucial but has long been challenging due to two key obstacles: certifiability and scalability. In this paper, we present a general framework to solve these challenges using compositional neural certificates based on ISS (Input-to-State Stability) Lyapunov functions. Specifically, we treat a large networked dynamical system as an interconnection of smaller subsystems and develop methods that can find each subsystem a decentralized controller and an ISS Lyapunov function; the latter can be collectively composed to prove the global stability of the system. To ensure the scalability of our approach, we develop generalizable and robust ISS Lyapunov functions where a single function can be used across different subsystems and the certificates we produced for small systems can be generalized to be used on large systems with similar structures. We encode both ISS Lyapunov functions and controllers as neural networks and propose a novel training methodology to handle the logic in ISS Lyapunov conditions that encodes the interconnection with neighboring subsystems. We demonstrate our approach in systems including Platoon, Drone formation control, and Power systems. Experimental results show that our framework can reduce the tracking error up to 75% compared with RL algorithms when applied to large-scale networked systems. △ Less

Submitted 11 April, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

Comments: 25 pages, 8 figures; Accepted by 5th Annual Learning for Dynamics & Control Conference (L4DC) 2023

arXiv:2303.10327 [pdf, other]

Hybrid Systems Neural Control with Region-of-Attraction Planner

Authors: Yue Meng, Chuchu Fan

Abstract: Hybrid systems are prevalent in robotics. However, ensuring the stability of hybrid systems is challenging due to sophisticated continuous and discrete dynamics. A system with all its system modes stable can still be unstable. Hence special treatments are required at mode switchings to stabilize the system. In this work, we propose a hierarchical, neural network (NN)-based method to control genera… ▽ More Hybrid systems are prevalent in robotics. However, ensuring the stability of hybrid systems is challenging due to sophisticated continuous and discrete dynamics. A system with all its system modes stable can still be unstable. Hence special treatments are required at mode switchings to stabilize the system. In this work, we propose a hierarchical, neural network (NN)-based method to control general hybrid systems. For each system mode, we first learn an NN Lyapunov function and an NN controller to ensure the states within the region of attraction (RoA) can be stabilized. Then an RoA NN estimator is learned across different modes. Upon mode switching, we propose a differentiable planner to ensure the states after switching can land in next mode's RoA, hence stabilizing the hybrid system. We provide novel theoretical stability guarantees and conduct experiments in car tracking control, pogobot navigation, and bipedal walker locomotion. Our method only requires 0.25X of the training time as needed by other learning-based methods. With low running time (10-50X faster than model predictive control (MPC)), our controller achieves a higher stability/success rate over other baselines such as MPC, reinforcement learning (RL), common Lyapunov methods (CLF), linear quadratic regulator (LQR), quadratic programming (QP) and Hamilton-Jacobian-based methods (HJB). The project page is on https://mit-realm.github.io/hybrid-clf. △ Less

Submitted 18 March, 2023; originally announced March 2023.

Comments: Accepted to L4DC2023

arXiv:2303.01211 [pdf, other]

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

Authors: Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang, Zhengqi Wen, Dan Zhang, Zhao Lv

Abstract: In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained information is very important, such as spectrogram defects, mute segments, and so on, which are often perceived by shallow networks. However, shallow networks have much noise, which can… ▽ More In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained information is very important, such as spectrogram defects, mute segments, and so on, which are often perceived by shallow networks. However, shallow networks have much noise, which can not capture this very well. To address this problem, we propose using the deepest network instruct shallow network for enhancing shallow networks. Specifically, the networks of FSD are divided into several segments, the deepest network being used as the teacher model, and all shallow networks become multiple student models by adding classifiers. Meanwhile, the distillation path between the deepest network feature and shallow network features is used to reduce the feature difference. A series of experimental results on the ASVspoof 2019 LA and PA datasets show the effectiveness of the proposed method, with significant improvements compared to the baseline. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.11719 [pdf, other]

Shield Model Predictive Path Integral: A Computationally Efficient Robust MPC Approach Using Control Barrier Functions

Authors: Ji Yin, Charles Dawson, Chuchu Fan, Panagiotis Tsiotras

Abstract: Model Predictive Path Integral (MPPI) control is a type of sampling-based model predictive control that simulates thousands of trajectories and uses these trajectories to synthesize optimal controls on-the-fly. In practice, however, MPPI encounters problems limiting its application. For instance, it has been observed that MPPI tends to make poor decisions if unmodeled dynamics or environmental dis… ▽ More Model Predictive Path Integral (MPPI) control is a type of sampling-based model predictive control that simulates thousands of trajectories and uses these trajectories to synthesize optimal controls on-the-fly. In practice, however, MPPI encounters problems limiting its application. For instance, it has been observed that MPPI tends to make poor decisions if unmodeled dynamics or environmental disturbances exist, preventing its use in safety-critical applications. Moreover, the multi-threaded simulations used by MPPI require significant onboard computational resources, making the algorithm inaccessible to robots without modern GPUs. To alleviate these issues, we propose a novel (Shield-MPPI) algorithm that provides robustness against unpredicted disturbances and achieves real-time planning using a much smaller number of parallel simulations on regular CPUs. The novel Shield-MPPI algorithm is tested on an aggressive autonomous racing platform both in simulation and using experiments. The results show that the proposed controller greatly reduces the number of constraint violations compared to state-of-the-art robust MPPI variants and stochastic MPC methods. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 8 pages, 7 figures. Submitted to RA-L for review

arXiv:2211.06073 [pdf, other]

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

Authors: Jiangyan Yi, Chenglong Wang, Jianhua Tao, Chu Yuan Zhang, Cunhang Fan, Zhengkun Tian, Haoxin Ma, Ruibo Fu

Abstract: Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio. These datasets leave out a scenario, in which the acoustic scene of an original audio is manipulated with a forged one. It will pose a major threat to our society i… ▽ More Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio. These datasets leave out a scenario, in which the acoustic scene of an original audio is manipulated with a forged one. It will pose a major threat to our society if some people misuse the manipulated audio with malicious purpose. Therefore, this motivates us to fill in the gap. This paper proposes such a dataset for scene fake audio detection named SceneFake, where a manipulated audio is generated by only tampering with the acoustic scene of an real utterance by using speech enhancement technologies. Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper. In addition, an analysis of fake attacks with different speech enhancement technologies and signal-to-noise ratios are presented in this paper. The results indicate that scene fake utterances cannot be reliably detected by baseline models trained on the ASVspoof 2019 dataset. Although these models perform well on the SceneFake training set and seen testing set, their performance is poor on the unseen test set. The dataset (https://zenodo.org/record/7663324#.Y_XKMuPYuUk) and benchmark source codes (https://github.com/ADDchallenge/SceneFake) are publicly available. △ Less

Submitted 4 April, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: Accepted by Pattern Recognition, 1 April 2024

arXiv:2210.02131 [pdf, other]

doi 10.1109/ICRA48891.2023.10161378

Density Planner: Minimizing Collision Risk in Motion Planning with Dynamic Obstacles using Density-based Reachability

Authors: Laura Lützow, Yue Meng, Andres Chavez Armijos, Chuchu Fan

Abstract: Uncertainty is prevalent in robotics. Due to measurement noise and complex dynamics, we cannot estimate the exact system and environment state. Since conservative motion planners are not guaranteed to find a safe control strategy in a crowded, uncertain environment, we propose a density-based method. Our approach uses a neural network and the Liouville equation to learn the density evolution for a… ▽ More Uncertainty is prevalent in robotics. Due to measurement noise and complex dynamics, we cannot estimate the exact system and environment state. Since conservative motion planners are not guaranteed to find a safe control strategy in a crowded, uncertain environment, we propose a density-based method. Our approach uses a neural network and the Liouville equation to learn the density evolution for a system with an uncertain initial state. We can plan for feasible and probably safe trajectories by applying a gradient-based optimization procedure to minimize the collision risk. We conduct motion planning experiments on simulated environments and environments generated from real-world data and outperform baseline methods such as model predictive control and nonlinear programming. While our method requires offline planning, the online run time is 100 times smaller compared to model predictive control. △ Less

Submitted 27 February, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: ©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2209.12270 [pdf, other]

Barrier functions enable safety-conscious force-feedback control

Authors: Charles Dawson, Austin Garrett, Falk Pollok, Yang Zhang, Chuchu Fan

Abstract: In order to be effective partners for humans, robots must become increasingly comfortable with making contact with their environment. Unfortunately, it is hard for robots to distinguish between ``just enough'' and ``too much'' force: some force is required to accomplish the task but too much might damage equipment or injure humans. Traditional approaches to designing compliant force-feedback contr… ▽ More In order to be effective partners for humans, robots must become increasingly comfortable with making contact with their environment. Unfortunately, it is hard for robots to distinguish between ``just enough'' and ``too much'' force: some force is required to accomplish the task but too much might damage equipment or injure humans. Traditional approaches to designing compliant force-feedback controllers, such as stiffness control, require difficult hand-tuning of control parameters and make it difficult to build safe, effective robot collaborators. In this paper, we propose a novel yet easy-to-implement force feedback controller that uses control barrier functions (CBFs) to derive a compliant controller directly from users' specifications of the maximum allowable forces and torques. We compare our approach to traditional stiffness control to demonstrate potential advantages of our control architecture, and we demonstrate the effectiveness of our controller on an example human-robot collaboration task: cooperative manipulation of a bulky object. △ Less

Submitted 25 September, 2022; originally announced September 2022.

arXiv:2209.12266 [pdf, other]

Enforcing safety for vision-based controllers via Control Barrier Functions and Neural Radiance Fields

Authors: Mukun Tong, Charles Dawson, Chuchu Fan

Abstract: To navigate complex environments, robots must increasingly use high-dimensional visual feedback (e.g. images) for control. However, relying on high-dimensional image data to make control decisions raises important questions; particularly, how might we prove the safety of a visual-feedback controller? Control barrier functions (CBFs) are powerful tools for certifying the safety of feedback controll… ▽ More To navigate complex environments, robots must increasingly use high-dimensional visual feedback (e.g. images) for control. However, relying on high-dimensional image data to make control decisions raises important questions; particularly, how might we prove the safety of a visual-feedback controller? Control barrier functions (CBFs) are powerful tools for certifying the safety of feedback controllers in the state-feedback setting, but CBFs have traditionally been poorly-suited to visual feedback control due to the need to predict future observations in order to evaluate the barrier function. In this work, we solve this issue by leveraging recent advances in neural radiance fields (NeRFs), which learn implicit representations of 3D scenes and can render images from previously-unseen camera perspectives, to provide single-step visual foresight for a CBF-based controller. This novel combination is able to filter out unsafe actions and intervene to preserve safety. We demonstrate the effect of our controller in real-time simulation experiments where it successfully prevents the robot from taking dangerous actions. △ Less

Submitted 28 February, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

Comments: Accepted to ICRA 2023

arXiv:2209.08073 [pdf, other]

Case Studies for Computing Density of Reachable States for Safe Autonomous Motion Planning

Authors: Yue Meng, Zeng Qiu, Md Tawhid Bin Waez, Chuchu Fan

Abstract: Density of the reachable states can help understand the risk of safety-critical systems, especially in situations when worst-case reachability is too conservative. Recent work provides a data-driven approach to compute the density distribution of autonomous systems' forward reachable states online. In this paper, we study the use of such approach in combination with model predictive control for ve… ▽ More Density of the reachable states can help understand the risk of safety-critical systems, especially in situations when worst-case reachability is too conservative. Recent work provides a data-driven approach to compute the density distribution of autonomous systems' forward reachable states online. In this paper, we study the use of such approach in combination with model predictive control for verifiable safe path planning under uncertainties. We first use the learned density distribution to compute the risk of collision online. If such risk exceeds the acceptable threshold, our method will plan for a new path around the previous trajectory, with the risk of collision below the threshold. Our method is well-suited to handle systems with uncertainties and complicated dynamics as our data-driven approach does not need an analytical form of the systems' dynamics and can estimate forward state density with an arbitrary initial distribution of uncertainties. We design two challenging scenarios (autonomous driving and hovercraft control) for safe motion planning in environments with obstacles under system uncertainties. We first show that our density estimation approach can reach a similar accuracy as the Monte-Carlo-based method while using only 0.01X training samples. By leveraging the estimated risk, our algorithm achieves the highest success rate in goal reaching when enforcing the safety rate above 0.99. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: NASA Formal Methods 2022

arXiv:2208.09618 [pdf, other]

Fully Automated End-to-End Fake Audio Detection

Authors: Chenglong Wang, Jiangyan Yi, Jianhua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

Abstract: The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure. However, artificial adjustment of the parameters can have a relatively obvious influence on the results. It is almost impossible to manually set the best set of parameters. Therefore this paper proposes a fully automated end-toen… ▽ More The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure. However, artificial adjustment of the parameters can have a relatively obvious influence on the results. It is almost impossible to manually set the best set of parameters. Therefore this paper proposes a fully automated end-toend fake audio detection method. We first use wav2vec pre-trained model to obtain a high-level representation of the speech. Furthermore, for the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS. It learns deep speech representations while automatically learning and optimizing complex neural structures consisting of convolutional operations and residual blocks. The experimental results on the ASVspoof 2019 LA dataset show that our proposed system achieves an equal error rate (EER) of 1.08%, which outperforms the state-of-the-art single system. △ Less

Submitted 20 August, 2022; originally announced August 2022.

arXiv:2208.03051 [pdf, other]

Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis

Authors: Jia Li, Ziyang Zhang, Junjie Lang, Yueqi Jiang, Liuwei An, Peng Zou, Yangyang Xu, Sheng Gao, Jie Lin, Chunxiao Fan, Xiao Sun, Meng Wang

Abstract: In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilizing different modalities and data sets. In our work, different kinds of multimodal features are extracted, including acoustic,… ▽ More In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilizing different modalities and data sets. In our work, different kinds of multimodal features are extracted, including acoustic, visual, text and biological features. These features are fused by TEMMA and GRU with self-attention mechanism frameworks. In this paper, 1) several new audio features, facial expression features and paragraph-level text embeddings are extracted for accuracy improvement. 2) we substantially improve the accuracy and reliability of multimodal sentiment prediction by mining and blending the multimodal features. 3) effective data augmentation strategies are applied in model training to alleviate the problem of sample imbalance and prevent the model from learning biased subject characters. For the MuSe-Humor sub-challenge, our model obtains the AUC score of 0.8932. For the MuSe-Reaction sub-challenge, the Pearson's Correlations Coefficient of our approach on the test set is 0.3879, which outperforms all other participants. For the MuSe-Stress sub-challenge, our approach outperforms the baseline in both arousal and valence on the test dataset, reaching a final combined result of 0.5151. △ Less

Submitted 12 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: 8 pages, 2 figures, to appear in MuSe 2022 (ACM MM2022 co-located workshop)

arXiv:2208.01214 [pdf, other]

doi 10.1145/3552466.3556526

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Authors: Jun Xue, Cunhang Fan, Zhao Lv, Jianhua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao

Abstract: Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific informatio… ▽ More Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific information in the subband, and these features also lose information such as phase. Inspired by the mechanism of synthetic speech, the fundamental frequency (F0) information is used to improve the quality of synthetic speech, while the F0 of synthetic speech is still too average, which differs significantly from that of real speech. It is expected that F0 can be used as important information to discriminate between bonafide and fake speech, while this information cannot be used directly due to the irregular distribution of F0. Insteadly, the frequency band containing most of F0 is selected as the input feature. Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately. Finally, the results of F0, real and imaginary spectrogram features are fused. Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems. △ Less

Submitted 1 August, 2022; originally announced August 2022.

arXiv:2206.02748 [pdf, other]

Compound Multi-branch Feature Fusion for Real Image Restoration

Authors: Chi-Mao Fan, Tsung-Jung Liu, Kuan-Hsien Liu

Abstract: Image restoration is a challenging and ill-posed problem which also has been a long-standing issue. However, most of learning based restoration methods are proposed to target one degradation type which means they are lack of generalization. In this paper, we proposed a multi-branch restoration model inspired from the Human Visual System (i.e., Retinal Ganglion Cells) which can achieve multiple res… ▽ More Image restoration is a challenging and ill-posed problem which also has been a long-standing issue. However, most of learning based restoration methods are proposed to target one degradation type which means they are lack of generalization. In this paper, we proposed a multi-branch restoration model inspired from the Human Visual System (i.e., Retinal Ganglion Cells) which can achieve multiple restoration tasks in a general framework. The experiments show that the proposed multi-branch architecture, called CMFNet, has competitive performance results on four datasets, including image dehazing, deraindrop, and deblurring, which are very common applications for autonomous cars. The source code and pretrained models of three restoration tasks are available at https://github.com/FanChiMao/CMFNet. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2205.02758 [pdf, other]

Quantitative Measures for Integrating Resilience into Transportation Planning Practice: Study in Texas

Authors: Cheng-Chun Lee, Akhil Rajput, Chia-Wei Hsu, Chao Fan, Faxi Yuan, Shangjia Dong, Amir Esmalian, Hamed Farahmand, Flavia Ioana Patrascu, Chia-Fu Liu, Bo Li, Junwei Ma, Ali Mostafavi

Abstract: The objective of this study is to propose a system-level framework with quantitative measures to assess the resilience of road networks. The framework proposed in this paper can help transportation agencies incorporate resilience considerations into project development proactively and to understand the resilience performance of current road networks effectively. This study identified and implement… ▽ More The objective of this study is to propose a system-level framework with quantitative measures to assess the resilience of road networks. The framework proposed in this paper can help transportation agencies incorporate resilience considerations into project development proactively and to understand the resilience performance of current road networks effectively. This study identified and implemented four quantitative metrics to classify the criticality of road segments based on critical dimensions of road network resilience, and two integrated metrics were proposed to combine all metrics to show the overall resilience performance of road segments. A case study was conducted on the Texas road networks to demonstrate the effectiveness of implementing this framework in a practical scenario. Since the data used in this study is available to other states and countries, the framework presented in this study can be adopted by other transportation agencies across the globe for regional transportation resilience assessments. △ Less

Submitted 5 May, 2022; v1 submitted 4 April, 2022; originally announced May 2022.

arXiv:2203.02038 [pdf, other]

Robust Counterexample-guided Optimization for Planning from Differentiable Temporal Logic

Authors: Charles Dawson, Chuchu Fan

Abstract: Signal temporal logic (STL) provides a powerful, flexible framework for specifying complex autonomy tasks; however, existing methods for planning based on STL specifications have difficulty scaling to long-horizon tasks and are not robust to external disturbances. In this paper, we present an algorithm for finding robust plans that satisfy STL specifications. Our method alternates between local op… ▽ More Signal temporal logic (STL) provides a powerful, flexible framework for specifying complex autonomy tasks; however, existing methods for planning based on STL specifications have difficulty scaling to long-horizon tasks and are not robust to external disturbances. In this paper, we present an algorithm for finding robust plans that satisfy STL specifications. Our method alternates between local optimization and local falsification, using automatically differentiable temporal logic to iteratively optimize its plan in response to counterexamples found during the falsification process. We benchmark our counterexample-guided planning method against state-of-the-art planning methods on two long-horizon satellite rendezvous missions, showing that our method finds high-quality plans that satisfy STL specifications despite adversarial disturbances. We find that our method consistently finds plans that are robust to adversarial disturbances and requires less than half the time of competing methods. We provide an implementation of our planner at https://github.com/MIT-REALM/architect. △ Less

Submitted 3 March, 2022; originally announced March 2022.

arXiv:2203.01645 [pdf, other]

doi 10.23919/EUSIPCO55093.2022.9909521

Selective Residual M-Net for Real Image Denoising

Authors: Chi-Mao Fan, Tsung-Jung Liu, Kuan-Hsien Liu

Abstract: Image restoration is a low-level vision task which is to restore degraded images to noise-free images. With the success of deep neural networks, the convolutional neural networks surpass the traditional restoration methods and become the mainstream in the computer vision area. To advance the performanceof denoising algorithms, we propose a blind real image denoising network (SRMNet) by employing a… ▽ More Image restoration is a low-level vision task which is to restore degraded images to noise-free images. With the success of deep neural networks, the convolutional neural networks surpass the traditional restoration methods and become the mainstream in the computer vision area. To advance the performanceof denoising algorithms, we propose a blind real image denoising network (SRMNet) by employing a hierarchical architecture improved from U-Net. Specifically, we use a selective kernel with residual block on the hierarchical structure called M-Net to enrich the multi-scale semantic information. Furthermore, our SRMNet has competitive performance results on two synthetic and two real-world noisy datasets in terms of quantitative metrics and visual quality. The source code and pretrained model are available at https://github.com/TentativeGitHub/SRMNet. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: arXiv admin note: text overlap with arXiv:2203.01296

arXiv:2203.01296 [pdf, other]

doi 10.1109/ICIP46576.2022.9897503

Half Wavelet Attention on M-Net+ for Low-Light Image Enhancement

Authors: Chi-Mao Fan, Tsung-Jung Liu, Kuan-Hsien Liu

Abstract: Low-Light Image Enhancement is a computer vision task which intensifies the dark images to appropriate brightness. It can also be seen as an ill-posed problem in image restoration domain. With the success of deep neural networks, the convolutional neural networks surpass the traditional algorithm-based methods and become the mainstream in the computer vision area. To advance the performance of enh… ▽ More Low-Light Image Enhancement is a computer vision task which intensifies the dark images to appropriate brightness. It can also be seen as an ill-posed problem in image restoration domain. With the success of deep neural networks, the convolutional neural networks surpass the traditional algorithm-based methods and become the mainstream in the computer vision area. To advance the performance of enhancement algorithms, we propose an image enhancement network (HWMNet) based on an improved hierarchical model: M-Net+. Specifically, we use a half wavelet attention block on M-Net+ to enrich the features from wavelet domain. Furthermore, our HWMNet has competitive performance results on two image enhancement datasets in terms of quantitative metrics and visual quality. The source code and pretrained model are available at https://github.com/FanChiMao/HWMNet. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2202.14009 [pdf, other]

doi 10.1109/ISCAS48785.2022.9937486

SUNet: Swin Transformer UNet for Image Denoising

Authors: Chi-Mao Fan, Tsung-Jung Liu, Kuan-Hsien Liu

Abstract: Image restoration is a challenging ill-posed problem which also has been a long-standing issue. In the past few years, the convolution neural networks (CNNs) almost dominated the computer vision and had achieved considerable success in different levels of vision tasks including image restoration. However, recently the Swin Transformer-based model also shows impressive performance, even surpasses t… ▽ More Image restoration is a challenging ill-posed problem which also has been a long-standing issue. In the past few years, the convolution neural networks (CNNs) almost dominated the computer vision and had achieved considerable success in different levels of vision tasks including image restoration. However, recently the Swin Transformer-based model also shows impressive performance, even surpasses the CNN-based methods to become the state-of-the-art on high-level vision tasks. In this paper, we proposed a restoration model called SUNet which uses the Swin Transformer layer as our basic block and then is applied to UNet architecture for image denoising. The source code and pre-trained models are available at https://github.com/FanChiMao/SUNet. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2202.11762 [pdf, other]

Safe Control with Learned Certificates: A Survey of Neural Lyapunov, Barrier, and Contraction methods

Authors: Charles Dawson, Sicun Gao, Chuchu Fan

Abstract: Learning-enabled control systems have demonstrated impressive empirical performance on challenging control problems in robotics, but this performance comes at the cost of reduced transparency and lack of guarantees on the safety or stability of the learned controllers. In recent years, new techniques have emerged to provide these guarantees by learning certificates alongside control policies -- th… ▽ More Learning-enabled control systems have demonstrated impressive empirical performance on challenging control problems in robotics, but this performance comes at the cost of reduced transparency and lack of guarantees on the safety or stability of the learned controllers. In recent years, new techniques have emerged to provide these guarantees by learning certificates alongside control policies -- these certificates provide concise, data-driven proofs that guarantee the safety and stability of the learned control system. These methods not only allow the user to verify the safety of a learned controller but also provide supervision during training, allowing safety and stability requirements to influence the training process itself. In this paper, we provide a comprehensive survey of this rapidly developing field of certificate learning. We hope that this paper will serve as an accessible introduction to the theory and practice of certificate learning, both to those who wish to apply these tools to practical robotics problems and to those who wish to dive more deeply into the theory of learning for control. △ Less

Submitted 20 December, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: Accepted at IEEE Transactions on Robotics. Supplementary code available at https://github.com/MIT-REALM/neural_clbf

arXiv:2202.08433 [pdf, ps, other]

ADD 2022: the First Audio Deep Synthesis Detection Challenge

Authors: Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Xiaohui Zhang, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

Abstract: Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake gam… ▽ More Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake game (FG). The LF track focuses on dealing with bona fide and fully fake utterances with various real-world noises etc. The PF track aims to distinguish the partially fake audio from the real. The FG track is a rivalry game, which includes two tasks: an audio generation task and an audio fake detection task. In this paper, we describe the datasets, evaluation metrics, and protocols. We also report major findings that reflect the recent advances in audio deepfake detection tasks. △ Less

Submitted 2 July, 2024; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Accepted by ICASSP 2022

arXiv:2201.05247 [pdf, other]

Multi-agent Motion Planning from Signal Temporal Logic Specifications

Authors: Dawei Sun, Jingkai Chen, Sayan Mitra, Chuchu Fan

Abstract: We tackle the challenging problem of multi-agent cooperative motion planning for complex tasks described using signal temporal logic (STL), where robots can have nonlinear and nonholonomic dynamics. Existing methods in multi-agent motion planning, especially those based on discrete abstractions and model predictive control (MPC), suffer from limited scalability with respect to the complexity of th… ▽ More We tackle the challenging problem of multi-agent cooperative motion planning for complex tasks described using signal temporal logic (STL), where robots can have nonlinear and nonholonomic dynamics. Existing methods in multi-agent motion planning, especially those based on discrete abstractions and model predictive control (MPC), suffer from limited scalability with respect to the complexity of the task, the size of the workspace, and the planning horizon. We present a method based on {\em timed waypoints\/} to address this issue. We show that timed waypoints can help abstract nonlinear behaviors of the system as safety envelopes around the reference path defined by those waypoints. Then the search for waypoints satisfying the STL specifications can be inductively encoded as a mixed-integer linear program. The agents following the synthesized timed waypoints have their tasks automatically allocated, and are guaranteed to satisfy the STL specifications while avoiding collisions. We evaluate the algorithm on a wide variety of benchmarks. Results show that it supports multi-agent planning from complex specification over long planning horizons, and significantly outperforms state-of-the-art abstraction-based and MPC-based motion planning methods. The implementation is available at https://github.com/sundw2014/STLPlanning. △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: Accepted to IEEE Robotics and Automation Letters (RA-L)

arXiv:2201.01918 [pdf, other]

SABLAS: Learning Safe Control for Black-box Dynamical Systems

Authors: Zengyi Qin, Dawei Sun, Chuchu Fan

Abstract: Control certificates based on barrier functions have been a powerful tool to generate probably safe control policies for dynamical systems. However, existing methods based on barrier certificates are normally for white-box systems with differentiable dynamics, which makes them inapplicable to many practical applications where the system is a black-box and cannot be accurately modeled. On the other… ▽ More Control certificates based on barrier functions have been a powerful tool to generate probably safe control policies for dynamical systems. However, existing methods based on barrier certificates are normally for white-box systems with differentiable dynamics, which makes them inapplicable to many practical applications where the system is a black-box and cannot be accurately modeled. On the other side, model-free reinforcement learning (RL) methods for black-box systems suffer from lack of safety guarantees and low sampling efficiency. In this paper, we propose a novel method that can learn safe control policies and barrier certificates for black-box dynamical systems, without requiring for an accurate system model. Our method re-designs the loss function to back-propagate gradient to the control policy even when the black-box dynamical system is non-differentiable, and we show that the safety certificates hold on the black-box system. Empirical results in simulation show that our method can significantly improve the performance of the learned policies by achieving nearly 100% safety and goal reaching rates using much fewer training samples, compared to state-of-the-art black-box safe control methods. Our learned agents can also generalize to unseen scenarios while keeping the original performance. The source code can be found at https://github.com/Zengyi-Qin/bcbf. △ Less

Submitted 8 January, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: IEEE Robotics and Automation Letters, 2022

arXiv:2201.00932 [pdf, other]

Learning Safe, Generalizable Perception-based Hybrid Control with Certificates

Authors: Charles Dawson, Bethany Lowenkamp, Dylan Goff, Chuchu Fan

Abstract: Many robotic tasks require high-dimensional sensors such as cameras and Lidar to navigate complex environments, but developing certifiably safe feedback controllers around these sensors remains a challenging open problem, particularly when learning is involved. Previous works have proved the safety of perception-feedback controllers by separating the perception and control subsystems and making st… ▽ More Many robotic tasks require high-dimensional sensors such as cameras and Lidar to navigate complex environments, but developing certifiably safe feedback controllers around these sensors remains a challenging open problem, particularly when learning is involved. Previous works have proved the safety of perception-feedback controllers by separating the perception and control subsystems and making strong assumptions on the abilities of the perception subsystem. In this work, we introduce a novel learning-enabled perception-feedback hybrid controller, where we use Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) to show the safety and liveness of a full-stack perception-feedback controller. We use neural networks to learn a CBF and CLF for the full-stack system directly in the observation space of the robot, without the need to assume a separate perception-based state estimator. Our hybrid controller, called LOCUS (Learning-enabled Observation-feedback Control Using Switching), can safely navigate unknown environments, consistently reach its goal, and generalizes safely to environments outside of the training dataset. We demonstrate LOCUS in experiments both in simulation and in hardware, where it successfully navigates a changing environment using feedback from a Lidar sensor. △ Less

Submitted 3 January, 2022; originally announced January 2022.

Comments: Accepted for publication in RA-L

arXiv:2112.08232 [pdf]

doi 10.1088/1361-6560/ac7193

RA V-Net: Deep learning network for automated liver segmentation

Authors: Zhiqi Lee, Sumin Qi, Chongchong Fan, Ziwei Xie

Abstract: Accurate segmentation of the liver is a prerequisite for the diagnosis of disease. Automated segmentation is an important application of computer-aided detection and diagnosis of liver disease. In recent years, automated processing of medical images has gained breakthroughs. However, the low contrast of abdominal scan CT images and the complexity of liver morphology make accurate automatic segment… ▽ More Accurate segmentation of the liver is a prerequisite for the diagnosis of disease. Automated segmentation is an important application of computer-aided detection and diagnosis of liver disease. In recent years, automated processing of medical images has gained breakthroughs. However, the low contrast of abdominal scan CT images and the complexity of liver morphology make accurate automatic segmentation challenging. In this paper, we propose RA V-Net, which is an improved medical image automatic segmentation model based on U-Net. It has the following three main innovations. CofRes Module (Composite Original Feature Residual Module) is proposed. With more complex convolution layers and skip connections to make it obtain a higher level of image feature extraction capability and prevent gradient disappearance or explosion. AR Module (Attention Recovery Module) is proposed to reduce the computational effort of the model. In addition, the spatial features between the data pixels of the encoding and decoding modules are sensed by adjusting the channels and LSTM convolution. Finally, the image features are effectively retained. CA Module (Channel Attention Module) is introduced, which used to extract relevant channels with dependencies and strengthen them by matrix dot product, while weakening irrelevant channels without dependencies. The purpose of channel attention is achieved. The attention mechanism provided by LSTM convolution and CA Module are strong guarantees for the performance of the neural network. The accuracy of U-Net network: 0.9862, precision: 0.9118, DSC: 0.8547, JSC: 0.82. The evaluation metrics of RA V-Net, accuracy: 0.9968, precision: 0.9597, DSC: 0.9654, JSC: 0.9414. The most representative metric for the segmentation effect is DSC, which improves 0.1107 over U-Net, and JSC improves 0.1214. △ Less

Submitted 15 December, 2021; v1 submitted 15 December, 2021; originally announced December 2021.

arXiv:2110.10965 [pdf, other]

2020 CATARACTS Semantic Segmentation Challenge

Authors: Imanol Luengo, Maria Grammatikopoulou, Rahim Mohammadi, Chris Walsh, Chinedu Innocent Nwoye, Deepak Alapatt, Nicolas Padoy, Zhen-Liang Ni, Chen-Chen Fan, Gui-Bin Bian, Zeng-Guang Hou, Heonjin Ha, Jiacheng Wang, Haojie Wang, Dong Guo, Lu Wang, Guotai Wang, Mobarakol Islam, Bharat Giddwani, Ren Hongliang, Theodoros Pissas, Claudio Ravasio, Martin Huber, Jeremy Birch, Joan M. Nunez Do Rio , et al. (15 additional authors not shown)

Abstract: Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presenc… ▽ More Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set. △ Less

Submitted 24 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2110.00693 [pdf, other]

A Theoretical Overview of Neural Contraction Metrics for Learning-based Control with Guaranteed Stability

Authors: Hiroyasu Tsukamoto, Soon-Jo Chung, Jean-Jacques Slotine, Chuchu Fan

Abstract: This paper presents a theoretical overview of a Neural Contraction Metric (NCM): a neural network model of an optimal contraction metric and corresponding differential Lyapunov function, the existence of which is a necessary and sufficient condition for incremental exponential stability of non-autonomous nonlinear system trajectories. Its innovation lies in providing formal robustness guarantees f… ▽ More This paper presents a theoretical overview of a Neural Contraction Metric (NCM): a neural network model of an optimal contraction metric and corresponding differential Lyapunov function, the existence of which is a necessary and sufficient condition for incremental exponential stability of non-autonomous nonlinear system trajectories. Its innovation lies in providing formal robustness guarantees for learning-based control frameworks, utilizing contraction theory as an analytical tool to study the nonlinear stability of learned systems via convex optimization. In particular, we rigorously show in this paper that, by regarding modeling errors of the learning schemes as external disturbances, the NCM control is capable of obtaining an explicit bound on the distance between a time-varying target trajectory and perturbed solution trajectories, which exponentially decreases with time even under the presence of deterministic and stochastic perturbation. These useful features permit simultaneous synthesis of a contraction metric and associated control law by a neural network, thereby enabling real-time computable and probably robust learning-based control for general control-affine nonlinear systems. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: IEEE Conference on Decision and Control (CDC), Preprint Version. Accepted July, 2021

arXiv:2109.06697 [pdf, other]

Safe Nonlinear Control Using Robust Neural Lyapunov-Barrier Functions

Authors: Charles Dawson, Zengyi Qin, Sicun Gao, Chuchu Fan

Abstract: Safety and stability are common requirements for robotic control systems; however, designing safe, stable controllers remains difficult for nonlinear and uncertain models. We develop a model-based learning approach to synthesize robust feedback controllers with safety and stability guarantees. We take inspiration from robust convex optimization and Lyapunov theory to define robust control Lyapunov… ▽ More Safety and stability are common requirements for robotic control systems; however, designing safe, stable controllers remains difficult for nonlinear and uncertain models. We develop a model-based learning approach to synthesize robust feedback controllers with safety and stability guarantees. We take inspiration from robust convex optimization and Lyapunov theory to define robust control Lyapunov barrier functions that generalize despite model uncertainty. We demonstrate our approach in simulation on problems including car trajectory tracking, nonlinear control with obstacle avoidance, satellite rendezvous with safety constraints, and flight control with a learned ground effect model. Simulation results show that our approach yields controllers that match or exceed the capabilities of robust MPC while reducing computational costs by an order of magnitude. △ Less

Submitted 6 October, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

Comments: Accepted to the 5th Annual Conference on Robot Learning (CoRL 21)

arXiv:2106.12764 [pdf, other]

Density Constrained Reinforcement Learning

Authors: Zengyi Qin, Yuxiao Chen, Chuchu Fan

Abstract: We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoi… ▽ More We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoid the time-consuming process of designing and tuning cost functions required by value function-based constraints to encode system specifications. We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally and the constrains are guaranteed to be satisfied. We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect. We use a set of comprehensive experiments to demonstrate the advantages of our approach over state-of-the-art CRL methods, with a wide range of density constrained tasks as well as standard CRL benchmarks such as Safety-Gym. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: Accepted by ICML, 2021

arXiv:2105.08573 [pdf, other]

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

Authors: Wenqing Chen, Jidong Tian, Caoyun Fan, Hao He, Yaohui Jin

Abstract: Recent work for image captioning mainly followed an extract-then-generate paradigm, pre-extracting a sequence of object-based features and then formulating image captioning as a single sequence-to-sequence task. Although promising, we observed two problems in generated captions: 1) content inconsistency where models would generate contradicting facts; 2) not informative enough where models would m… ▽ More Recent work for image captioning mainly followed an extract-then-generate paradigm, pre-extracting a sequence of object-based features and then formulating image captioning as a single sequence-to-sequence task. Although promising, we observed two problems in generated captions: 1) content inconsistency where models would generate contradicting facts; 2) not informative enough where models would miss parts of important information. From a causal perspective, the reason is that models have captured spurious statistical correlations between visual features and certain expressions (e.g., visual features of "long hair" and "woman"). In this paper, we propose a dependent multi-task learning framework with the causal intervention (DMTCI). Firstly, we involve an intermediate task, bag-of-categories generation, before the final task, image captioning. The intermediate task would help the model better understand the visual features and thus alleviate the content inconsistency problem. Secondly, we apply Pearl's do-calculus on the model, cutting off the link between the visual features and possible confounders and thus letting models focus on the causal visual features. Specifically, the high-frequency concept set is considered as the proxy confounders where the real confounders are inferred in the continuous space. Finally, we use a multi-agent reinforcement learning (MARL) strategy to enable end-to-end training and reduce the inter-task error accumulations. The extensive experiments show that our model outperforms the baseline models and achieves competitive performance with state-of-the-art models. △ Less

Submitted 18 May, 2021; originally announced May 2021.

Comments: To be published in IJCAI 2021

arXiv:2105.06270 [pdf, other]

Group Feature Learning and Domain Adversarial Neural Network for aMCI Diagnosis System Based on EEG

Authors: Chen-Chen Fan, Haiqun Xie, Liang Peng, Hongjun Yang, Zhen-Liang Ni, Guan'an Wang, Yan-Jie Zhou, Sheng Chen, Zhijie Fang, Shuyun Huang, Zeng-Guang Hou

Abstract: Medical diagnostic robot systems have been paid more and more attention due to its objectivity and accuracy. The diagnosis of mild cognitive impairment (MCI) is considered an effective means to prevent Alzheimer's disease (AD). Doctors diagnose MCI based on various clinical examinations, which are expensive and the diagnosis results rely on the knowledge of doctors. Therefore, it is necessary to d… ▽ More Medical diagnostic robot systems have been paid more and more attention due to its objectivity and accuracy. The diagnosis of mild cognitive impairment (MCI) is considered an effective means to prevent Alzheimer's disease (AD). Doctors diagnose MCI based on various clinical examinations, which are expensive and the diagnosis results rely on the knowledge of doctors. Therefore, it is necessary to develop a robot diagnostic system to eliminate the influence of human factors and obtain a higher accuracy rate. In this paper, we propose a novel Group Feature Domain Adversarial Neural Network (GF-DANN) for amnestic MCI (aMCI) diagnosis, which involves two important modules. A Group Feature Extraction (GFE) module is proposed to reduce individual differences by learning group-level features through adversarial learning. A Dual Branch Domain Adaptation (DBDA) module is carefully designed to reduce the distribution difference between the source and target domain in a domain adaption way. On three types of data set, GF-DANN achieves the best accuracy compared with classic machine learning and deep learning methods. On the DMS data set, GF-DANN has obtained an accuracy rate of 89.47%, and the sensitivity and specificity are 90% and 89%. In addition, by comparing three EEG data collection paradigms, our results demonstrate that the DMS paradigm has the potential to build an aMCI diagnose robot system. △ Less

Submitted 28 April, 2021; originally announced May 2021.

Comments: This paper has been accepted by 2021 International Conference on Robotics and Automation (ICRA 2021)

arXiv:2103.02114 [pdf, other]

A Computational Design and Evaluation Tool for 3D Structures with Planar Surfaces

Authors: Chang Liu, Wenzhong Yan, Pehuen Moure, Cody Fan, Ankur Mehta

Abstract: Three dimensional (3D) structures composed of planar surfaces can be build out of accessible materials using easier fabrication technique with shorter fabrication time. To better design 3D structures with planar surfaces, realistic models are required to understand and evaluate mechanical behaviors. Existing design tools are either effort-consuming (e.g. finite element analysis) or bounded by assu… ▽ More Three dimensional (3D) structures composed of planar surfaces can be build out of accessible materials using easier fabrication technique with shorter fabrication time. To better design 3D structures with planar surfaces, realistic models are required to understand and evaluate mechanical behaviors. Existing design tools are either effort-consuming (e.g. finite element analysis) or bounded by assumptions (e.g. numerical solutions). In this project, We have built a computational design tool that is (1) capable of rapidly and inexpensively evaluating planar surfaces in 3D structures, with sufficient computational efficiency and accuracy; (2) applicable to complex boundary conditions and loading conditions, both isotropic materials and orthotropic materials; and (3) suitable for rapid accommodation when design parameters need to be adjusted. We demonstrate the efficiency and necessity of this design tool by evaluating a glass table as well as a wood bookcase, and iteratively designing an origami gripper to satisfy performance requirements. This design tool gives non-expert users as well as engineers a simple and effective modus operandi in structural design. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2102.08261 [pdf, other]

Optimal Mixed Discrete-Continuous Planning for Linear Hybrid Systems

Authors: Jingkai Chen, Brian Williams, Chuchu Fan

Abstract: Planning in hybrid systems with both discrete and continuous control variables is important for dealing with real-world applications such as extra-planetary exploration and multi-vehicle transportation systems. Meanwhile, generating high-quality solutions given certain hybrid planning specifications is crucial to building high-performance hybrid systems. However, since hybrid planning is challengi… ▽ More Planning in hybrid systems with both discrete and continuous control variables is important for dealing with real-world applications such as extra-planetary exploration and multi-vehicle transportation systems. Meanwhile, generating high-quality solutions given certain hybrid planning specifications is crucial to building high-performance hybrid systems. However, since hybrid planning is challenging in general, most methods use greedy search that is guided by various heuristics, which is neither complete nor optimal and often falls into blind search towards an infinite-action plan. In this paper, we present a hybrid automaton planning formalism and propose an optimal approach that encodes this planning problem as a Mixed Integer Linear Program (MILP) by fixing the action number of automaton runs. We also show an extension of our approach for reasoning over temporally concurrent goals. By leveraging an efficient MILP optimizer, our method is able to generate provably optimal solutions for complex mixed discrete-continuous planning problems within a reasonable time. We use several case studies to demonstrate the extraordinary performance of our hybrid planning method and show that it outperforms a state-of-the-art hybrid planner, Scotty, in both efficiency and solution qualities. △ Less

Submitted 20 February, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: Accepted at HSCC2021. 12 pages, 8 figures, 3 tables

Showing 1–50 of 75 results for author: Fan, C