-
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Authors:
Yingting Li,
Ambuj Mehrish,
Bryan Chew,
Bo Cheng,
Soujanya Poria
Abstract:
Different languages have distinct phonetic systems and vary in their prosodic features making it challenging to develop a Text-to-Speech (TTS) model that can effectively synthesise speech in multilingual settings. Furthermore, TTS architecture needs to be both efficient enough to capture nuances in multiple languages and efficient enough to be practical for deployment. The standard approach is to…
▽ More
Different languages have distinct phonetic systems and vary in their prosodic features making it challenging to develop a Text-to-Speech (TTS) model that can effectively synthesise speech in multilingual settings. Furthermore, TTS architecture needs to be both efficient enough to capture nuances in multiple languages and efficient enough to be practical for deployment. The standard approach is to build transformer based model such as SpeechT5 and train it on large multilingual dataset. As the size of these models grow the conventional fine-tuning for adapting these model becomes impractical due to heavy computational cost. In this paper, we proposes to integrate parameter-efficient transfer learning (PETL) methods such as adapters and hypernetwork with TTS architecture for multilingual speech synthesis. Notably, in our experiments PETL methods able to achieve comparable or even better performance compared to full fine-tuning with only $\sim$2.5\% tunable parameters.The code and samples are available at: https://anonymous.4open.science/r/multilingualTTS-BA4C.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Self-supervised learning for classifying paranasal anomalies in the maxillary sinus
Authors:
Debayan Bhattacharya,
Finn Behrendt,
Benjamin Tobias Becker,
Lennart Maack,
Dirk Beyersdorff,
Elina Petersen,
Marvin Petersen,
Bastian Cheng,
Dennis Eggert,
Christian Betz,
Anna Sophie Hoffmann,
Alexander Schlaefer
Abstract:
Purpose: Paranasal anomalies, frequently identified in routine radiological screenings, exhibit diverse morphological characteristics. Due to the diversity of anomalies, supervised learning methods require large labelled dataset exhibiting diverse anomaly morphology. Self-supervised learning (SSL) can be used to learn representations from unlabelled data. However, there are no SSL methods designed…
▽ More
Purpose: Paranasal anomalies, frequently identified in routine radiological screenings, exhibit diverse morphological characteristics. Due to the diversity of anomalies, supervised learning methods require large labelled dataset exhibiting diverse anomaly morphology. Self-supervised learning (SSL) can be used to learn representations from unlabelled data. However, there are no SSL methods designed for the downstream task of classifying paranasal anomalies in the maxillary sinus (MS).
Methods: Our approach uses a 3D Convolutional Autoencoder (CAE) trained in an unsupervised anomaly detection (UAD) framework. Initially, we train the 3D CAE to reduce reconstruction errors when reconstructing normal maxillary sinus (MS) image. Then, this CAE is applied to an unlabelled dataset to generate coarse anomaly locations by creating residual MS images. Following this, a 3D Convolutional Neural Network (CNN) reconstructs these residual images, which forms our SSL task. Lastly, we fine-tune the encoder part of the 3D CNN on a labelled dataset of normal and anomalous MS images.
Results: The proposed SSL technique exhibits superior performance compared to existing generic self-supervised methods, especially in scenarios with limited annotated data. When trained on just 10% of the annotated dataset, our method achieves an Area Under the Precision-Recall Curve (AUPRC) of 0.79 for the downstream classification task. This performance surpasses other methods, with BYOL attaining an AUPRC of 0.75, SimSiam at 0.74, SimCLR at 0.73 and Masked Autoencoding using SparK at 0.75.
Conclusion: A self-supervised learning approach that inherently focuses on localizing paranasal anomalies proves to be advantageous, particularly when the subsequent task involves differentiating normal from anomalous maxillary sinuses. Access our code at https://github.com/mtec-tuhh/self-supervised-paranasal-anomaly
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images
Authors:
Yingjie Xi,
Boyuan Cheng,
Jingyao Cai,
Jian Jun Zhang,
Xiaosong Yang
Abstract:
The human whole-body X-rays could offer a valuable reference for various applications, including medical diagnostics, digital animation modeling, and ergonomic design. The traditional method of obtaining X-ray information requires the use of CT (Computed Tomography) scan machines, which emit potentially harmful radiation. Thus it faces a significant limitation for realistic applications because it…
▽ More
The human whole-body X-rays could offer a valuable reference for various applications, including medical diagnostics, digital animation modeling, and ergonomic design. The traditional method of obtaining X-ray information requires the use of CT (Computed Tomography) scan machines, which emit potentially harmful radiation. Thus it faces a significant limitation for realistic applications because it lacks adaptability and safety. In our work, We proposed a new method to directly generate the 2D human whole-body X-rays from the human masking images. The predicted images will be similar to the real ones with the same image style and anatomic structure. We employed a data-driven strategy. By leveraging advanced generative techniques, our model MaSkel(Masking image to Skeleton X-rays) could generate a high-quality X-ray image from a human masking image without the need for invasive and harmful radiation exposure, which not only provides a new path to generate highly anatomic and customized data but also reduces health risks. To our knowledge, our model MaSkel is the first work for predicting whole-body X-rays. In this paper, we did two parts of the work. The first one is to solve the data limitation problem, the diffusion-based techniques are utilized to make a data augmentation, which provides two synthetic datasets for preliminary pretraining. Then we designed a two-stage training strategy to train MaSkel. At last, we make qualitative and quantitative evaluations of the generated X-rays. In addition, we invite some professional doctors to assess our predicted data. These evaluations demonstrate the MaSkel's superior ability to generate anatomic X-rays from human masking images. The related code and links of the dataset are available at https://github.com/2022yingjie/MaSkel.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Authors:
Yingting Li,
Rishabh Bhardwaj,
Ambuj Mehrish,
Bo Cheng,
Soujanya Poria
Abstract:
Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for…
▽ More
Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for each new domain, thus making it parameter-inefficient. This problem can be solved by Adapters that provide a parameter-efficient alternative to domain adaptation. Although famous in NLP, speech synthesis has not seen much improvement from Adapters. In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic. Extensive evaluations of two domain adaptation settings demonstrate its effectiveness in achieving state-of-the-art performance in the parameter-efficient regime. We also compare different variants of HyperTTS, comparing them with baselines in different studies. Promising results on the dynamic adaptation of adapter parameters using hypernetworks open up new avenues for domain-generic multi-speaker TTS systems. The audio samples and code are available at https://github.com/declare-lab/HyperTTS.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Authors:
Xiang Li,
Fan Bu,
Ambuj Mehrish,
Yingting Li,
Jiale Han,
Bo Cheng,
Soujanya Poria
Abstract:
Neural Text-to-Speech (TTS) systems find broad applications in voice assistants, e-learning, and audiobook creation. The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis. Yet, the efficiency of multi-step sampling in Diffusion Models presents challenges. Efforts have been made to integrate GANs with DMs, speeding up infere…
▽ More
Neural Text-to-Speech (TTS) systems find broad applications in voice assistants, e-learning, and audiobook creation. The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis. Yet, the efficiency of multi-step sampling in Diffusion Models presents challenges. Efforts have been made to integrate GANs with DMs, speeding up inference by approximating denoising distributions, but this introduces issues with model convergence due to adversarial training. To overcome this, we introduce CM-TTS, a novel architecture grounded in consistency models (CMs). Drawing inspiration from continuous-time diffusion models, CM-TTS achieves top-quality speech synthesis in fewer steps without adversarial training or pre-trained model dependencies. We further design weighted samplers to incorporate different sampling positions into model training with dynamic probabilities, ensuring unbiased learning throughout the entire training process. We present a real-time mel-spectrogram generation consistency model, validated through comprehensive evaluations. Experimental results underscore CM-TTS's superiority over existing single-step speech synthesis systems, representing a significant advancement in the field.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
From Flies to Robots: Inverted Landing in Small Quadcopters with Dynamic Perching
Authors:
Bryan Habas,
Bo Cheng
Abstract:
Inverted landing is a routine behavior among a number of animal fliers. However, mastering this feat poses a considerable challenge for robotic fliers, especially to perform dynamic perching with rapid body rotations (or flips) and landing against gravity. Inverted landing in flies have suggested that optical flow senses are closely linked to the precise triggering and control of body flips that l…
▽ More
Inverted landing is a routine behavior among a number of animal fliers. However, mastering this feat poses a considerable challenge for robotic fliers, especially to perform dynamic perching with rapid body rotations (or flips) and landing against gravity. Inverted landing in flies have suggested that optical flow senses are closely linked to the precise triggering and control of body flips that lead to a variety of successful landing behaviors. Building upon this knowledge, we aimed to replicate the flies' landing behaviors in small quadcopters by developing a control policy general to arbitrary ceiling-approach conditions. First, we employed reinforcement learning in simulation to optimize discrete sensory-motor pairs across a broad spectrum of ceiling-approach velocities and directions. Next, we converted the sensory-motor pairs to a two-stage control policy in a continuous augmented-optical flow space. The control policy consists of a first-stage Flip-Trigger Policy, which employs a one-class support vector machine, and a second-stage Flip-Action Policy, implemented as a feed-forward neural network. To transfer the inverted-landing policy to physical systems, we utilized domain randomization and system identification techniques for a zero-shot sim-to-real transfer. As a result, we successfully achieved a range of robust inverted-landing behaviors in small quadcopters, emulating those observed in flies.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Full-State Prescribed Performance-Based Consensus of Double-Integrator Multi-Agent Systems with Jointly Connected Topologies
Authors:
Yahui Hou,
Bin Cheng
Abstract:
This paper addresses the full-state prescribed performance-based consensus problem for double-integrator multi-agent systems with jointly connected topologies. To improve the transient performance, a distributed prescribed performance control protocol consisting of the transformed relative position and the transformed relative velocity is proposed, where the communication topology satisfies the jo…
▽ More
This paper addresses the full-state prescribed performance-based consensus problem for double-integrator multi-agent systems with jointly connected topologies. To improve the transient performance, a distributed prescribed performance control protocol consisting of the transformed relative position and the transformed relative velocity is proposed, where the communication topology satisfies the jointly connected assumption. Different from the existing literatures, two independent transient performance specifications imposed on relative positions and relative velocities can be guaranteed simultaneously. A numerical example is ultimately used to validate the effectiveness of proposed protocol.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Parallel in-memory wireless computing
Authors:
Cong Wang,
Gong-Jie Ruan,
Zai-Zheng Yang,
Xing-Jian Yangdong,
Yixiang Li,
Liang Wu,
Yingmeng Ge,
Yichen Zhao,
Chen Pan,
Wei Wei,
Li-Bo Wang,
Bin Cheng,
Zaichen Zhang,
Chuan Zhang,
Shi-Jun Liang,
Feng Miao
Abstract:
Parallel wireless digital communication with ultralow power consumption is critical for emerging edge technologies such as 5G and Internet of Things. However, the physical separation between digital computing units and analogue transmission units in traditional wireless technology leads to high power consumption. Here we report a parallel in-memory wireless computing scheme. The approach combines…
▽ More
Parallel wireless digital communication with ultralow power consumption is critical for emerging edge technologies such as 5G and Internet of Things. However, the physical separation between digital computing units and analogue transmission units in traditional wireless technology leads to high power consumption. Here we report a parallel in-memory wireless computing scheme. The approach combines in-memory computing with wireless communication using memristive crossbar arrays. We show that the system can be used for the radio transmission of a binary stream of 480 bits with a bit error rate of 0. The in-memory wireless computing uses two orders of magnitude less power than conventional technology (based on digital-to-analogue and analogue-to-digital converters). We also show that the approach can be applied to acoustic and optical wireless communications
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
Multiple Instance Ensembling For Paranasal Anomaly Classification In The Maxillary Sinus
Authors:
Debayan Bhattacharya,
Finn Behrendt,
Benjamin Tobias Becker,
Dirk Beyersdorff,
Elina Petersen,
Marvin Petersen,
Bastian Cheng,
Dennis Eggert,
Christian Betz,
Anna Sophie Hoffmann,
Alexander Schlaefer
Abstract:
Paranasal anomalies are commonly discovered during routine radiological screenings and can present with a wide range of morphological features. This diversity can make it difficult for convolutional neural networks (CNNs) to accurately classify these anomalies, especially when working with limited datasets. Additionally, current approaches to paranasal anomaly classification are constrained to ide…
▽ More
Paranasal anomalies are commonly discovered during routine radiological screenings and can present with a wide range of morphological features. This diversity can make it difficult for convolutional neural networks (CNNs) to accurately classify these anomalies, especially when working with limited datasets. Additionally, current approaches to paranasal anomaly classification are constrained to identifying a single anomaly at a time. These challenges necessitate the need for further research and development in this area.
In this study, we investigate the feasibility of using a 3D convolutional neural network (CNN) to classify healthy maxillary sinuses (MS) and MS with polyps or cysts. The task of accurately identifying the relevant MS volume within larger head and neck Magnetic Resonance Imaging (MRI) scans can be difficult, but we develop a straightforward strategy to tackle this challenge. Our end-to-end solution includes the use of a novel sampling technique that not only effectively localizes the relevant MS volume, but also increases the size of the training dataset and improves classification results. Additionally, we employ a multiple instance ensemble prediction method to further boost classification performance. Finally, we identify the optimal size of MS volumes to achieve the highest possible classification performance on our dataset.
With our multiple instance ensemble prediction strategy and sampling strategy, our 3D CNNs achieve an F1 of 0.85 whereas without it, they achieve an F1 of 0.70.
We demonstrate the feasibility of classifying anomalies in the MS. We propose a data enlarging strategy alongside a novel ensembling strategy that proves to be beneficial for paranasal anomaly classification in the MS.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
Unsupervised Anomaly Detection of Paranasal Anomalies in the Maxillary Sinus
Authors:
Debayan Bhattacharya,
Finn Behrendt,
Benjamin Tobias Becker,
Dirk Beyersdorff,
Elina Petersen,
Marvin Petersen,
Bastian Cheng,
Dennis Eggert,
Christian Betz,
Anna Sophie Hoffmann,
Alexander Schlaefer
Abstract:
Deep learning (DL) algorithms can be used to automate paranasal anomaly detection from Magnetic Resonance Imaging (MRI). However, previous works relied on supervised learning techniques to distinguish between normal and abnormal samples. This method limits the type of anomalies that can be classified as the anomalies need to be present in the training data. Further, many data points from normal an…
▽ More
Deep learning (DL) algorithms can be used to automate paranasal anomaly detection from Magnetic Resonance Imaging (MRI). However, previous works relied on supervised learning techniques to distinguish between normal and abnormal samples. This method limits the type of anomalies that can be classified as the anomalies need to be present in the training data. Further, many data points from normal and anomaly class are needed for the model to achieve satisfactory classification performance. However, experienced clinicians can segregate between normal samples (healthy maxillary sinus) and anomalous samples (anomalous maxillary sinus) after looking at a few normal samples. We mimic the clinicians ability by learning the distribution of healthy maxillary sinuses using a 3D convolutional auto-encoder (cAE) and its variant, a 3D variational autoencoder (VAE) architecture and evaluate cAE and VAE for this task. Concretely, we pose the paranasal anomaly detection as an unsupervised anomaly detection problem. Thereby, we are able to reduce the labelling effort of the clinicians as we only use healthy samples during training. Additionally, we can classify any type of anomaly that differs from the training distribution. We train our 3D cAE and VAE to learn a latent representation of healthy maxillary sinus volumes using L1 reconstruction loss. During inference, we use the reconstruction error to classify between normal and anomalous maxillary sinuses. We extract sub-volumes from larger head and neck MRIs and analyse the effect of different fields of view on the detection performance. Finally, we report which anomalies are easiest and hardest to classify using our approach. Our results demonstrate the feasibility of unsupervised detection of paranasal anomalies from MRIs with an AUPRC of 85% and 80% for cAE and VAE, respectively.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
DiscreteCommunication and ControlUpdating in Event-Triggered Consensus
Authors:
Bin Cheng,
Yuezu Lv,
Zhongkui Li,
Zhisheng Duan
Abstract:
This paper studies the consensus control problem faced with three essential demands, namely, discrete control updating for each agent, discrete-time communications among neighboring agents, and the fully distributed fashion of the controller implementation without requiring any global information of the whole network topology. Noting that the existing related results only meeting one or two demand…
▽ More
This paper studies the consensus control problem faced with three essential demands, namely, discrete control updating for each agent, discrete-time communications among neighboring agents, and the fully distributed fashion of the controller implementation without requiring any global information of the whole network topology. Noting that the existing related results only meeting one or two demands at most are essentially not applicable, in this paper we establish a novel framework to solve the problem of fully distributed consensus with discrete communication and control. The first key point in this framework is the design of controllers that are only updated at discrete event instants and do not depend on global information by introducing time-varying gains inspired by the adaptive control technique. Another key point is the invention of novel dynamic triggering functions that are independent of relative information among neighboring agents. Under the established framework, we propose fully distributed state-feedback event-triggered protocols for undirected graphs and also further study the more complexed cases of output-feedback control and directed graphs. Finally, numerical examples are provided to verify the effectiveness of the proposed event-triggered protocols.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers
Authors:
Bryan Habas,
Jack W. Langelaan,
Bo Cheng
Abstract:
Inverted landing in a rapid and robust manner is a challenging feat for aerial robots, especially while depending entirely on onboard sensing and computation. In spite of this, this feat is routinely performed by biological fliers such as bats, flies, and bees. Our previous work has identified a direct causal connection between a series of onboard visual cues and kinematic actions that allow for r…
▽ More
Inverted landing in a rapid and robust manner is a challenging feat for aerial robots, especially while depending entirely on onboard sensing and computation. In spite of this, this feat is routinely performed by biological fliers such as bats, flies, and bees. Our previous work has identified a direct causal connection between a series of onboard visual cues and kinematic actions that allow for reliable execution of this challenging aerobatic maneuver in small aerial robots. In this work, we first utilized Deep Reinforcement Learning and a physics-based simulation to obtain a general, optimal control policy for robust inverted landing starting from any arbitrary approach condition. This optimized control policy provides a computationally-efficient mapping from the system's observational space to its motor command action space, including both triggering and control of rotational maneuvers. This was done by training the system over a large range of approach flight velocities that varied with magnitude and direction.
Next, we performed a sim-to-real transfer and experimental validation of the learned policy via domain randomization, by varying the robot's inertial parameters in the simulation. Through experimental trials, we identified several dominant factors which greatly improved landing robustness and the primary mechanisms that determined inverted landing success. We expect the learning framework developed in this study can be generalized to solve more challenging tasks, such as utilizing noisy onboard sensory data, landing on surfaces of various orientations, or landing on dynamically-moving surfaces.
△ Less
Submitted 25 April, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Supervised Contrastive Learning to Classify Paranasal Anomalies in the Maxillary Sinus
Authors:
Debayan Bhattacharya,
Benjamin Tobias Becker,
Finn Behrendt,
Marcel Bengs,
Dirk Beyersdorff,
Dennis Eggert,
Elina Petersen,
Florian Jansen,
Marvin Petersen,
Bastian Cheng,
Christian Betz,
Alexander Schlaefer,
Anna Sophie Hoffmann
Abstract:
Using deep learning techniques, anomalies in the paranasal sinus system can be detected automatically in MRI images and can be further analyzed and classified based on their volume, shape and other parameters like local contrast. However due to limited training data, traditional supervised learning methods often fail to generalize. Existing deep learning methods in paranasal anomaly classification…
▽ More
Using deep learning techniques, anomalies in the paranasal sinus system can be detected automatically in MRI images and can be further analyzed and classified based on their volume, shape and other parameters like local contrast. However due to limited training data, traditional supervised learning methods often fail to generalize. Existing deep learning methods in paranasal anomaly classification have been used to diagnose at most one anomaly. In our work, we consider three anomalies. Specifically, we employ a 3D CNN to separate maxillary sinus volumes without anomalies from maxillary sinus volumes with anomalies. To learn robust representations from a small labelled dataset, we propose a novel learning paradigm that combines contrastive loss and cross-entropy loss. Particularly, we use a supervised contrastive loss that encourages embeddings of maxillary sinus volumes with and without anomaly to form two distinct clusters while the cross-entropy loss encourages the 3D CNN to maintain its discriminative ability. We report that optimising with both losses is advantageous over optimising with only one loss. We also find that our training strategy leads to label efficiency. With our method, a 3D CNN classifier achieves an AUROC of 0.85 while a 3D CNN classifier optimised with cross-entropy loss achieves an AUROC of 0.66.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
Optimal Inverted Landing in a Small Aerial Robot with Varied Approach Velocities and Landing Gear Designs
Authors:
Bryan Habas,
Bader AlAttar,
Brian Davis,
Jack W. Langelaan,
Bo Cheng
Abstract:
Inverted landing is a challenging feat to perform in aerial robots, especially without external positioning. However, it is routinely performed by biological fliers such as bees, flies, and bats. Our previous observations of landing behaviors in flies suggest an open-loop causal relationship between their putative visual cues and the kinematics of the aerial maneuvers executed. For example, the de…
▽ More
Inverted landing is a challenging feat to perform in aerial robots, especially without external positioning. However, it is routinely performed by biological fliers such as bees, flies, and bats. Our previous observations of landing behaviors in flies suggest an open-loop causal relationship between their putative visual cues and the kinematics of the aerial maneuvers executed. For example, the degree of rotational maneuver (the amount of body inversion prior to touchdown) and the amount of leg-assisted body swing both depend on the flies' initial body states while approaching the ceiling. In this work, inspired by the inverted landing behavior of flies, we used a physics-based simulation with experimental validation to systematically investigate how optimized inverted landing maneuvers depend on the initial approach velocities with varied magnitude and direction. This was done by analyzing the putative visual cues (that can be derived from onboard measurements) during optimal maneuvering trajectories. We identified a three-dimensional policy region, from which a mapping to a global inverted landing policy can be developed without the use of external positioning data. Through simulation, we also investigated the effects of an array of landing gear designs on the optimized landing performance and identified their advantages and disadvantages. The above results have been partially validated using limited experimental testing and will continue to inform and guide our future experiments, for example by applying the calculated global policy.
△ Less
Submitted 3 March, 2022; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios
Authors:
Jingliang Duan,
Yangang Ren,
Fawang Zhang,
Yang Guan,
Dongjie Yu,
Shengbo Eben Li,
Bo Cheng,
Lin Zhao
Abstract:
In this paper, we propose a new reinforcement learning (RL) algorithm, called encoding distributional soft actor-critic (E-DSAC), for decision-making in autonomous driving. Unlike existing RL-based decision-making methods, E-DSAC is suitable for situations where the number of surrounding vehicles is variable and eliminates the requirement for manually pre-designed sorting rules, resulting in highe…
▽ More
In this paper, we propose a new reinforcement learning (RL) algorithm, called encoding distributional soft actor-critic (E-DSAC), for decision-making in autonomous driving. Unlike existing RL-based decision-making methods, E-DSAC is suitable for situations where the number of surrounding vehicles is variable and eliminates the requirement for manually pre-designed sorting rules, resulting in higher policy performance and generality. We first develop an encoding distributional policy iteration (DPI) framework by embedding a permutation invariant module, which employs a feature neural network (NN) to encode the indicators of each vehicle, in the distributional RL framework. The proposed DPI framework is proved to exhibit important properties in terms of convergence and global optimality. Next, based on the developed encoding DPI framework, we propose the E-DSAC algorithm by adding the gradient-based update rule of the feature NN to the policy evaluation process of the DSAC algorithm. Then, the multi-lane driving task and the corresponding reward function are designed to verify the effectiveness of the proposed algorithm. Results show that the policy learned by E-DSAC can realize efficient, smooth, and relatively safe autonomous driving in the designed scenario. And the final policy performance learned by E-DSAC is about three times that of DSAC. Furthermore, its effectiveness has also been verified in real vehicle experiments.
△ Less
Submitted 12 September, 2021;
originally announced September 2021.
-
Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving
Authors:
Jingliang Duan,
Dongjie Yu,
Shengbo Eben Li,
Wenxuan Wang,
Yangang Ren,
Ziyu Lin,
Bo Cheng
Abstract:
In this paper, we propose a new state representation method, called encoding sum and concatenation (ESC), for the state representation of decision-making in autonomous driving. Unlike existing state representation methods, ESC is applicable to a variable number of surrounding vehicles and eliminates the need for manually pre-designed sorting rules, leading to higher representation ability and gene…
▽ More
In this paper, we propose a new state representation method, called encoding sum and concatenation (ESC), for the state representation of decision-making in autonomous driving. Unlike existing state representation methods, ESC is applicable to a variable number of surrounding vehicles and eliminates the need for manually pre-designed sorting rules, leading to higher representation ability and generality. The proposed ESC method introduces a representation neural network (NN) to encode each surrounding vehicle into an encoding vector, and then adds these vectors to obtain the representation vector of the set of surrounding vehicles. By concatenating the set representation with other variables, such as indicators of the ego vehicle and road, we realize the fixed-dimensional and permutation invariant state representation. This paper has further proved that the proposed ESC method can realize the injective representation if the output dimension of the representation NN is greater than the number of variables of all surrounding vehicles. This means that by taking the ESC representation as policy inputs, we can find the nearly optimal representation NN and policy NN by simultaneously optimizing them using gradient-based updating. Experiments demonstrate that compared with the fixed-permutation representation method, the proposed method improves the representation ability of the surrounding vehicles, and the corresponding approximation error is reduced by 62.2%.
△ Less
Submitted 4 March, 2022; v1 submitted 24 May, 2021;
originally announced May 2021.
-
DPN-SENet:A self-attention mechanism neural network for detection and diagnosis of COVID-19 from chest x-ray images
Authors:
Bo Cheng,
Ruhui Xue,
Hang Yang,
Laili Zhu,
Wei Xiang
Abstract:
Background and Objective: The new type of coronavirus is also called COVID-19. It began to spread at the end of 2019 and has now spread across the world. Until October 2020, It has infected around 37 million people and claimed about 1 million lives. We propose a deep learning model that can help radiologists and clinicians use chest X-rays to diagnose COVID-19 cases and show the diagnostic feature…
▽ More
Background and Objective: The new type of coronavirus is also called COVID-19. It began to spread at the end of 2019 and has now spread across the world. Until October 2020, It has infected around 37 million people and claimed about 1 million lives. We propose a deep learning model that can help radiologists and clinicians use chest X-rays to diagnose COVID-19 cases and show the diagnostic features of pneumonia. Methods: The approach in this study is: 1) we propose a data enhancement method to increase the diversity of the data set, thereby improving the generalization performance of the model. 2) Our deep convolution neural network model DPN-SE adds a self-attention mechanism to the DPN network. The addition of a self-attention mechanism has greatly improved the performance of the network. 3) Use the Lime interpretable library to mark the feature regions on the X-ray medical image that helps doctors more quickly diagnose COVID-19 in people. Results: Under the same network model, the data with and without data enhancement is put into the model for training respectively. At last, comparing two experimental results: among the 10 network models with different structures, 7 network models have improved their effects after using data enhancement, with an average improvement of 1% in recognition accuracy. We propose that the accuracy and recall rates of the DPN-SE network are 93% and 98% of cases (COVID vs. pneumonia bacteria vs. viral pneumonia vs. normal). Compared with the original DPN, the respective accuracy is improved by 2%. Conclusion: The data augmentation method we used has achieved effective results on a small amount of data set, showing that a reasonable data augmentation method can improve the recognition accuracy without changing the sample size and model structure. Overall, the proposed method and model can effectively become a very useful tool for clinical radiologists.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Recurrent Model Predictive Control
Authors:
Zhengyu Liu,
Jingliang Duan,
Wenxuan Wang,
Shengbo Eben Li,
Yuming Yin,
Ziyu Lin,
Qi Sun,
Bo Cheng
Abstract:
This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems. Unlike traditional Model Predictive Control (MPC) algorithms, it can make full use of the current computing resources and adaptively select the longest model prediction horizon. Our algorithm employs a recurrent function to approximate the…
▽ More
This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems. Unlike traditional Model Predictive Control (MPC) algorithms, it can make full use of the current computing resources and adaptively select the longest model prediction horizon. Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. The number of prediction steps is equal to the number of recurrent cycles of the learned policy function. With an arbitrary initial policy function, the proposed RMPC algorithm can converge to the optimal policy by directly minimizing the designed loss function. We further prove the convergence and optimality of the RMPC algorithm thorough Bellman optimality principle, and demonstrate its generality and efficiency using two numerical examples.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Recurrent Model Predictive Control: Learning an Explicit Recurrent Controller for Nonlinear Systems
Authors:
Zhengyu Liu,
Jingliang Duan,
Wenxuan Wang,
Shengbo Eben Li,
Yuming Yin,
Ziyu Lin,
Bo Cheng
Abstract:
This paper proposes an offline control algorithm, called Recurrent Model Predictive Control (RMPC), to solve large-scale nonlinear finite-horizon optimal control problems. It can be regarded as an explicit solver of traditional Model Predictive Control (MPC) algorithms, which can adaptively select appropriate model prediction horizon according to current computing resources, so as to improve the p…
▽ More
This paper proposes an offline control algorithm, called Recurrent Model Predictive Control (RMPC), to solve large-scale nonlinear finite-horizon optimal control problems. It can be regarded as an explicit solver of traditional Model Predictive Control (MPC) algorithms, which can adaptively select appropriate model prediction horizon according to current computing resources, so as to improve the policy performance. Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. The output of the learned policy network after N recurrent cycles corresponds to the nearly optimal solution of N-step MPC. A policy optimization objective is designed by decomposing the MPC cost function according to the Bellman's principle of optimality. The optimal recurrent policy can be obtained by directly minimizing the designed objective function, which is applicable for general nonlinear and non input-affine systems. Both simulation-based and real-robot path-tracking tasks are utilized to demonstrate the effectiveness of the proposed method.
△ Less
Submitted 8 April, 2022; v1 submitted 20 February, 2021;
originally announced February 2021.
-
Event-Triggered Consensus of Homogeneous and Heterogeneous Multi-Agent Systems with Jointly Connected Switching Topologies
Authors:
Bin Cheng,
Xiangke Wang,
Zhongkui Li
Abstract:
This paper investigates the distributed event-based consensus problem of switching networks satisfying the jointly connected condition. Both the state consensus of homogeneous linear networks and output consensus of heterogeneous networks are studied. Two kinds of event-based protocols based on local sampled information are designed, without the need to solve any matrix equation or inequality. The…
▽ More
This paper investigates the distributed event-based consensus problem of switching networks satisfying the jointly connected condition. Both the state consensus of homogeneous linear networks and output consensus of heterogeneous networks are studied. Two kinds of event-based protocols based on local sampled information are designed, without the need to solve any matrix equation or inequality. Theoretical analysis indicates that the proposed event-based protocols guarantee the achievement of consensus and the exclusion of Zeno behaviors for jointly connected undirected switching graphs. These protocols, relying on no global knowledge of the network topology and independent of switching rules, can be devised and utilized in a completely distributed manner. They are able to avoid continuous information exchanges for either controllers' updating or triggering functions' monitoring, which ensures the feasibility of the presented protocols.
△ Less
Submitted 22 September, 2020; v1 submitted 25 March, 2020;
originally announced March 2020.
-
Mixed Reinforcement Learning with Additive Stochastic Uncertainty
Authors:
Yao Mu,
Shengbo Eben Li,
Chang Liu,
Qi Sun,
Bingbing Nie,
Bo Cheng,
Baiyu Peng
Abstract:
Reinforcement learning (RL) methods often rely on massive exploration data to search optimal policies, and suffer from poor sampling efficiency. This paper presents a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy with the purpose of improving both learning accuracy and training speed. The dual r…
▽ More
Reinforcement learning (RL) methods often rely on massive exploration data to search optimal policies, and suffer from poor sampling efficiency. This paper presents a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy with the purpose of improving both learning accuracy and training speed. The dual representations indicate the environmental model and the state-action data: the former can accelerate the learning process of RL, while its inherent model uncertainty generally leads to worse policy accuracy than the latter, which comes from direct measurements of states and actions. In the framework design of the mixed RL, the compensation of the additive stochastic model uncertainty is embedded inside the policy iteration RL framework by using explored state-action data via iterative Bayesian estimator (IBE). The optimal policy is then computed in an iterative way by alternating between policy evaluation (PEV) and policy improvement (PIM). The convergence of the mixed RL is proved using the Bellman's principle of optimality, and the recursive stability of the generated policy is proved via the Lyapunov's direct method. The effectiveness of the mixed RL is demonstrated by a typical optimal control problem of stochastic non-affine nonlinear systems (i.e., double lane change task with an automated vehicle).
△ Less
Submitted 28 February, 2020;
originally announced March 2020.
-
Hierarchical Reinforcement Learning for Self-Driving Decision-Making without Reliance on Labeled Driving Data
Authors:
Jingliang Duan,
Shengbo Eben Li,
Yang Guan,
Qi Sun,
Bo Cheng
Abstract:
Decision making for self-driving cars is usually tackled by manually encoding rules from drivers' behaviors or imitating drivers' manipulation using supervised learning techniques. Both of them rely on mass driving data to cover all possible driving scenarios. This paper presents a hierarchical reinforcement learning method for decision making of self-driving cars, which does not depend on a large…
▽ More
Decision making for self-driving cars is usually tackled by manually encoding rules from drivers' behaviors or imitating drivers' manipulation using supervised learning techniques. Both of them rely on mass driving data to cover all possible driving scenarios. This paper presents a hierarchical reinforcement learning method for decision making of self-driving cars, which does not depend on a large amount of labeled driving data. This method comprehensively considers both high-level maneuver selection and low-level motion control in both lateral and longitudinal directions. We firstly decompose the driving tasks into three maneuvers, including driving in lane, right lane change and left lane change, and learn the sub-policy for each maneuver. Then, a master policy is learned to choose the maneuver policy to be executed in the current state. All policies including master policy and maneuver policies are represented by fully-connected neural networks and trained by using asynchronous parallel reinforcement learners (APRL), which builds a mapping from the sensory outputs to driving decisions. Different state spaces and reward functions are designed for each maneuver. We apply this method to a highway driving scenario, which demonstrates that it can realize smooth and safe decision making for self-driving cars.
△ Less
Submitted 27 January, 2020;
originally announced January 2020.
-
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Authors:
Jingliang Duan,
Yang Guan,
Shengbo Eben Li,
Yangang Ren,
Bo Cheng
Abstract:
In reinforcement learning (RL), function approximation errors are known to easily lead to the Q-value overestimations, thus greatly reducing policy performance. This paper presents a distributional soft actor-critic (DSAC) algorithm, which is an off-policy RL method for continuous control setting, to improve the policy performance by mitigating Q-value overestimations. We first discover in theory…
▽ More
In reinforcement learning (RL), function approximation errors are known to easily lead to the Q-value overestimations, thus greatly reducing policy performance. This paper presents a distributional soft actor-critic (DSAC) algorithm, which is an off-policy RL method for continuous control setting, to improve the policy performance by mitigating Q-value overestimations. We first discover in theory that learning a distribution function of state-action returns can effectively mitigate Q-value overestimations because it is capable of adaptively adjusting the update stepsize of the Q-value function. Then, a distributional soft policy iteration (DSPI) framework is developed by embedding the return distribution function into maximum entropy RL. Finally, we present a deep off-policy actor-critic variant of DSPI, called DSAC, which directly learns a continuous return distribution by keeping the variance of the state-action returns within a reasonable range to address exploding and vanishing gradient problems. We evaluate DSAC on the suite of MuJoCo continuous control tasks, achieving the state-of-the-art performance.
△ Less
Submitted 11 June, 2021; v1 submitted 8 January, 2020;
originally announced January 2020.
-
Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints
Authors:
Jingliang Duan,
Zhengyu Liu,
Shengbo Eben Li,
Qi Sun,
Zhenzhong Jia,
Bo Cheng
Abstract:
This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Firstly, a constrained generalized policy iteration (CGPI) framework is developed to handle state constraints by transforming the traditional poli…
▽ More
This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Firstly, a constrained generalized policy iteration (CGPI) framework is developed to handle state constraints by transforming the traditional policy improvement process into a constrained policy optimization problem. Next, we propose an actor-critic variant of CGPI, called CADP, in which both policy and value functions are approximated by multi-layer neural networks to directly map the system states to control inputs and value function, respectively. CADP linearizes the constrained optimization problem locally into a quadratically constrained linear programming problem, and then obtains the optimal update of the policy network by solving its dual problem. A trust region constraint is added to prevent excessive policy update, thus ensuring linearization accuracy. We determine the feasibility of the policy optimization problem by calculating the minimum trust region boundary and update the policy using two recovery rules when infeasible. The vehicle control problem in the path-tracking task is used to demonstrate the effectiveness of this proposed method.
△ Less
Submitted 8 April, 2022; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Panoptic-DeepLab
Authors:
Bowen Cheng,
Maxwell D. Collins,
Yukun Zhu,
Ting Liu,
Thomas S. Huang,
Hartwig Adam,
Liang-Chieh Chen
Abstract:
We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation…
▽ More
We present Panoptic-DeepLab, a bottom-up and single-shot approach for panoptic segmentation. Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. Our single Panoptic-DeepLab sets the new state-of-art at all three Cityscapes benchmarks, reaching 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set, and advances results on the other challenging Mapillary Vistas.
△ Less
Submitted 23 October, 2019; v1 submitted 10 October, 2019;
originally announced October 2019.
-
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
Authors:
Bowen Cheng,
Bin Xiao,
Jingdong Wang,
Honghui Shi,
Thomas S. Huang,
Lei Zhang
Abstract:
Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation…
▽ More
Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene. The code and models are available at https://github.com/HRNet/Higher-HRNet-Human-Pose-Estimation.
△ Less
Submitted 12 March, 2020; v1 submitted 27 August, 2019;
originally announced August 2019.
-
A neural network approach to GOP-level rate control of x265 using Lookahead
Authors:
Boya Cheng,
Yuan Zhang
Abstract:
To optimize the perceived quality under a specific bitrate constraint, multi-pass encoding is usually performed with the rate control mode of the average bitrate (ABR) or the constant rate factor (CRF) to distribute bits as reasonably as possible in terms of perceived quality, leading to high computational complexity. In this paper, we propose to utilize the video information generated during the…
▽ More
To optimize the perceived quality under a specific bitrate constraint, multi-pass encoding is usually performed with the rate control mode of the average bitrate (ABR) or the constant rate factor (CRF) to distribute bits as reasonably as possible in terms of perceived quality, leading to high computational complexity. In this paper, we propose to utilize the video information generated during the encoding to adaptively adjust the CRF setting at GOP level, ensuring the bits of frames in each GOP are allocated reasonably under the bitrate constraint with a single-pass encoding framework. In particular, due to the inherent relationship between CRF values and bitrates, we adopt a shallow neural network (NN) to map video content features to the CRF-bitrate model. The content-related features are collected from the lookahead module inside the x265 encoder, including encoding cost estimation, motion vector and so on. Further, a rate control method, called content adaptive rate factor (CARF), is proposed to adjust the CRF value of each GOP with the requirement of the target bitrate by using the predicted CRF-bitrate models of each GOP. The experimental results show that the proposed approach can make 84.5\% testing data within 20% bitrate error (or better) and outperform the ABR mode in x265, leading to 5.23% BD-rate reduction on average.
△ Less
Submitted 24 October, 2019; v1 submitted 8 August, 2019;
originally announced August 2019.
-
Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice
Authors:
Vikramjit Mitra,
Sue Booker,
Erik Marchi,
David Scott Farrar,
Ute Dorothea Peitz,
Bridget Cheng,
Ermine Teves,
Anuj Mehta,
Devang Naik
Abstract:
Millions of people reach out to digital assistants such as Siri every day, asking for information, making phone calls, seeking assistance, and much more. The expectation is that such assistants should understand the intent of the users query. Detecting the intent of a query from a short, isolated utterance is a difficult task. Intent cannot always be obtained from speech-recognized transcriptions.…
▽ More
Millions of people reach out to digital assistants such as Siri every day, asking for information, making phone calls, seeking assistance, and much more. The expectation is that such assistants should understand the intent of the users query. Detecting the intent of a query from a short, isolated utterance is a difficult task. Intent cannot always be obtained from speech-recognized transcriptions. A transcription driven approach can interpret what has been said but fails to acknowledge how it has been said, and as a consequence, may ignore the expression present in the voice. Our work investigates whether a system can reliably detect vocal expression in queries using acoustic and paralinguistic embedding. Results show that the proposed method offers a relative equal error rate (EER) decrease of 60% compared to a bag-of-word based system, corroborating that expression is significantly represented by vocal attributes, rather than being purely lexical. Addition of emotion embedding helped to reduce the EER by 30% relative to the acoustic embedding, demonstrating the relevance of emotion in expressive voice.
△ Less
Submitted 28 June, 2019;
originally announced July 2019.
-
Fully Distributed Event-Triggered Protocols for Linear Multi-Agent Networks
Authors:
Bin Cheng,
Zhongkui Li
Abstract:
This paper considers the distributed event-triggered consensus problem for general linear multi-agent networks. Both the leaderless and leader-follower consensus problems are considered. Based on the local sampled state or local output information, distributed adaptive event-triggered protocols are designed, which can ensure that consensus of the agents is achieved and the Zeno behavior is exclude…
▽ More
This paper considers the distributed event-triggered consensus problem for general linear multi-agent networks. Both the leaderless and leader-follower consensus problems are considered. Based on the local sampled state or local output information, distributed adaptive event-triggered protocols are designed, which can ensure that consensus of the agents is achieved and the Zeno behavior is excluded by showing that the interval between any two triggering events is lower bounded by a strictly positive value. Compared to the previous related works, our main contribution is that the proposed adaptive event-based protocols are fully distributed and scalable, which do not rely on any global information of the network graph and are independent of the network's scale. In these event-based protocols, continuous communications are not required for either control laws updating or triggering functions monitoring.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.