Search | arXiv e-print repository

Real-time Dexterous Telemanipulation with an End-Effect-Oriented Learning-based Approach

Authors: Haoyang Wang, He Bai, Xiaoli Zhang, Yunsik Jung, Michel Bowman, Lingfeng Tao

Abstract: Dexterous telemanipulation is crucial in advancing human-robot systems, especially in tasks requiring precise and safe manipulation. However, it faces significant challenges due to the physical differences between human and robotic hands, the dynamic interaction with objects, and the indirect control and perception of the remote environment. Current approaches predominantly focus on mapping the hu… ▽ More Dexterous telemanipulation is crucial in advancing human-robot systems, especially in tasks requiring precise and safe manipulation. However, it faces significant challenges due to the physical differences between human and robotic hands, the dynamic interaction with objects, and the indirect control and perception of the remote environment. Current approaches predominantly focus on mapping the human hand onto robotic counterparts to replicate motions, which exhibits a critical oversight: it often neglects the physical interaction with objects and relegates the interaction burden to the human to adapt and make laborious adjustments in response to the indirect and counter-intuitive observation of the remote environment. This work develops an End-Effects-Oriented Learning-based Dexterous Telemanipulation (EFOLD) framework to address telemanipulation tasks. EFOLD models telemanipulation as a Markov Game, introducing multiple end-effect features to interpret the human operator's commands during interaction with objects. These features are used by a Deep Reinforcement Learning policy to control the robot and reproduce such end effects. EFOLD was evaluated with real human subjects and two end-effect extraction methods for controlling a virtual Shadow Robot Hand in telemanipulation tasks. EFOLD achieved real-time control capability with low command following latency (delay<0.11s) and highly accurate tracking (MSE<0.084 rad). △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted by IROS 2024

arXiv:2407.15259 [pdf, other]

New Rules for Causal Identification with Background Knowledge

Authors: Tian-Zuo Wang, Lue Tao, Zhi-Hua Zhou

Abstract: Identifying causal relations is crucial for a variety of downstream tasks. In additional to observational data, background knowledge (BK), which could be attained from human expertise or experiments, is usually introduced for uncovering causal relations. This raises an open problem that in the presence of latent variables, what causal relations are identifiable from observational data and BK. In t… ▽ More Identifying causal relations is crucial for a variety of downstream tasks. In additional to observational data, background knowledge (BK), which could be attained from human expertise or experiments, is usually introduced for uncovering causal relations. This raises an open problem that in the presence of latent variables, what causal relations are identifiable from observational data and BK. In this paper, we propose two novel rules for incorporating BK, which offer a new perspective to the open problem. In addition, we show that these rules are applicable in some typical causality tasks, such as determining the set of possible causal effects with observational data. Our rule-based approach enhances the state-of-the-art method by circumventing a process of enumerating block sets that would otherwise take exponential complexity. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.06087 [pdf, other]

Analytic Convolutional Layer: A Step to Analytic Neural Network

Authors: Jingmao Cui, Donglai Tao, Linmi Tao, Ruiyang Liu, Yu Cheng

Abstract: The prevailing approach to embedding prior knowledge within convolutional layers typically includes the design of steerable kernels or their modulation using designated kernel banks. In this study, we introduce the Analytic Convolutional Layer (ACL), an innovative model-driven convolutional layer, which is a mosaic of analytical convolution kernels (ACKs) and traditional convolution kernels. ACKs… ▽ More The prevailing approach to embedding prior knowledge within convolutional layers typically includes the design of steerable kernels or their modulation using designated kernel banks. In this study, we introduce the Analytic Convolutional Layer (ACL), an innovative model-driven convolutional layer, which is a mosaic of analytical convolution kernels (ACKs) and traditional convolution kernels. ACKs are characterized by mathematical functions governed by analytic kernel parameters (AKPs) learned in training process. Learnable AKPs permit the adaptive update of incorporated knowledge to align with the features representation of data. Our extensive experiments demonstrate that the ACLs not only have a remarkable capacity for feature representation with a reduced number of parameters but also attain increased reliability through the analytical formulation of ACKs. Furthermore, ACLs offer a means for neural network interpretation, thereby paving the way for the intrinsic interpretability of neural network. The source code will be published in company with the paper. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03374 [pdf]

An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

Authors: Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, Yiling Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

Abstract: Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg… ▽ More Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Large Model, heralds a technological revolution with the potential to fundamentally reshape traditional technological fields and human production methods. Its capabilities, including strong generalization, reasoning, and generative attributes, present opportunities to address PHM's bottlenecks. To this end, based on a systematic analysis of the current challenges and bottlenecks in PHM, as well as the research status and advantages of Large Model, we propose a novel concept and three progressive paradigms of Prognosis and Health Management Large Model (PHM-LM) through the integration of the Large Model with PHM. Subsequently, we provide feasible technical approaches for PHM-LM to bolster PHM's core capabilities within the framework of the three paradigms. Moreover, to address core issues confronting PHM, we discuss a series of technical challenges of PHM-LM throughout the entire process of construction and application. This comprehensive effort offers a holistic PHM-LM technical framework, and provides avenues for new PHM technologies, methodologies, tools, platforms and applications, which also potentially innovates design, research & development, verification and application mode of PHM. And furthermore, a new generation of PHM with AI will also capably be realized, i.e., from custom to generalized, from discriminative to generative, and from theoretical conditions to practical applications. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.03187 [pdf]

Holistic view of the road transportation system based on real-time data sharing mechanism

Authors: Li Tao, Dong Xiang, Hao Junfeng, Yin Ping, Xu Xiaoxue, Lai Maokai, Li Yuan, Peng Ting

Abstract: Traditional manual driving and single-vehicle-based intelligent driving have limitations in real-time and accurate acquisition of the current driving status and intentions of surrounding vehicles, leading to vehicles typically maintaining appropriate safe distances from each other. Yet, accidents still frequently occur, especially in merging areas; meanwhile, it is difficult to comprehensively obt… ▽ More Traditional manual driving and single-vehicle-based intelligent driving have limitations in real-time and accurate acquisition of the current driving status and intentions of surrounding vehicles, leading to vehicles typically maintaining appropriate safe distances from each other. Yet, accidents still frequently occur, especially in merging areas; meanwhile, it is difficult to comprehensively obtain the conditions of road infrastructure. These limitations not only restrict the further improvement of road capacity but also result in irreparable losses of life and property. To overcome this bottleneck, this paper constructs a space-time global view of the road traffic system based on a real-time sharing mechanism, enabling both road users and managers to timely access the driving intentions of nearby vehicles and the real-time status of road infrastructure. △ Less

Submitted 3 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

arXiv:2405.15337 [pdf, other]

Discriminative Estimation of Total Variation Distance: A Fidelity Auditor for Generative Data

Authors: Lan Tao, Shirong Xu, Chi-Hua Wang, Namjoon Suh, Guang Cheng

Abstract: With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively charac… ▽ More With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively characterizes the relation between the Bayes risk in classifying two distributions and their TV distance. Therefore, the estimation of total variation distance reduces to that of the Bayes risk. In particular, this paper establishes theoretical results regarding the convergence rate of the estimation error of TV distance between two Gaussian distributions. We demonstrate that, with a specific choice of hypothesis class in classification, a fast convergence rate in estimating the TV distance can be achieved. Specifically, the estimation accuracy of the TV distance is proven to inherently depend on the separation of two Gaussian distributions: smaller estimation errors are achieved when the two Gaussian distributions are farther apart. This phenomenon is also validated empirically through extensive simulations. In the end, we apply this discriminative estimation method to rank fidelity of synthetic image data using the MNIST dataset. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2403.04261 [pdf]

doi 10.1016/j.jbi.2024.104716

Advancing Chinese biomedical text mining with community challenges

Authors: Hui Zong, Rongrong Wu, Jiaxue Cha, Weizhe Feng, Erman Wu, Jiakun Li, Aibin Shao, Liang Tao, Zuofeng Li, Buzhou Tang, Bairong Shen

Abstract: Objective: This study aims to review the recent advances in community challenges for biomedical text mining in China. Methods: We collected information of evaluation tasks released in community challenges of biomedical text mining, including task description, dataset description, data source, task type and related links. A systematic summary and comparative analysis were conducted on various biome… ▽ More Objective: This study aims to review the recent advances in community challenges for biomedical text mining in China. Methods: We collected information of evaluation tasks released in community challenges of biomedical text mining, including task description, dataset description, data source, task type and related links. A systematic summary and comparative analysis were conducted on various biomedical natural language processing tasks, such as named entity recognition, entity normalization, attribute extraction, relation extraction, event extraction, text classification, text similarity, knowledge graph construction, question answering, text generation, and large language model evaluation. Results: We identified 39 evaluation tasks from 6 community challenges that spanned from 2017 to 2023. Our analysis revealed the diverse range of evaluation task types and data sources in biomedical text mining. We explored the potential clinical applications of these community challenge tasks from a translational biomedical informatics perspective. We compared with their English counterparts, and discussed the contributions, limitations, lessons and guidelines of these community challenges, while highlighting future directions in the era of large language models. Conclusion: Community challenge evaluation competitions have played a crucial role in promoting technology innovation and fostering interdisciplinary collaboration in the field of biomedical text mining. These challenges provide valuable platforms for researchers to develop state-of-the-art solutions. △ Less

Submitted 29 August, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Journal ref: Journal of Biomedical Informatics. 2024;157:104716.

arXiv:2402.14213 [pdf]

Contrastive Learning of Shared Spatiotemporal EEG Representations Across Individuals for Naturalistic Neuroscience

Authors: Xinke Shen, Lingyi Tao, Xuyang Chen, Sen Song, Quanying Liu, Dan Zhang

Abstract: Neural representations induced by naturalistic stimuli offer insights into how humans respond to stimuli in daily life. Understanding neural mechanisms underlying naturalistic stimuli processing hinges on the precise identification and extraction of the shared neural patterns that are consistently present across individuals. Targeting the Electroencephalogram (EEG) technique, known for its rich sp… ▽ More Neural representations induced by naturalistic stimuli offer insights into how humans respond to stimuli in daily life. Understanding neural mechanisms underlying naturalistic stimuli processing hinges on the precise identification and extraction of the shared neural patterns that are consistently present across individuals. Targeting the Electroencephalogram (EEG) technique, known for its rich spatial and temporal information, this study presents a framework for Contrastive Learning of Shared SpatioTemporal EEG Representations across individuals (CL-SSTER). CL-SSTER utilizes contrastive learning to maximize the similarity of EEG representations across individuals for identical stimuli, contrasting with those for varied stimuli. The network employed spatial and temporal convolutions to simultaneously learn the spatial and temporal patterns inherent in EEG. The versatility of CL-SSTER was demonstrated on three EEG datasets, including a synthetic dataset, a natural speech comprehension EEG dataset, and an emotional video watching EEG dataset. CL-SSTER attained the highest inter-subject correlation (ISC) values compared to the state-of-the-art ISC methods. The latent representations generated by CL-SSTER exhibited reliable spatiotemporal EEG patterns, which can be explained by properties of the naturalistic stimuli. CL-SSTER serves as an interpretable and scalable framework for the identification of inter-subject shared neural representations in naturalistic neuroscience. △ Less

Submitted 13 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 54 pages, 17 figures

arXiv:2401.15287 [pdf, other]

Applications of Tao General Difference in Discrete Domain

Authors: Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, Jingmao Cui

Abstract: Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capab… ▽ More Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capabilities in real-world applications. A novel smoothness property of a sequence is defined on the first- and second TGD. This property is used to denoise one-dimensional signals, where the noise is the non-smooth points in the sequence. Meanwhile, the center of the gradient in a finite interval can be accurately location via TGD calculation. This solves a traditional challenge in computer vision, which is the precise localization of image edges with noise robustness. Furthermore, the power of TGD operators extends to spatio-temporal edge detection in three-dimensional arrays, enabling the identification of kinetic edges in video data. These diverse applications highlight the properties of TGD in discrete domain and the significant promise of TGD for the computation across signal processing, image analysis, and video analytic. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: This paper is the application part of the paper "Tao General Differential and Difference: Theory and Application". The theory part of the paper is renamed as "A Theory of General Difference in Continuous and Discrete Domain", which is Arxived in arXiv:2305.08098v2

arXiv:2401.11396 [pdf, other]

Visual Imitation Learning with Calibrated Contrastive Representation

Authors: Yunke Wang, Linwei Tao, Bo Du, Yutian Lin, Chang Xu

Abstract: Adversarial Imitation Learning (AIL) allows the agent to reproduce expert behavior with low-dimensional states and actions. However, challenges arise in handling visual states due to their less distinguishable representation compared to low-dimensional proprioceptive features. While existing methods resort to adopt complex network architectures or separate the process of learning representation an… ▽ More Adversarial Imitation Learning (AIL) allows the agent to reproduce expert behavior with low-dimensional states and actions. However, challenges arise in handling visual states due to their less distinguishable representation compared to low-dimensional proprioceptive features. While existing methods resort to adopt complex network architectures or separate the process of learning representation and decision-making, they overlook valuable intra-agent information within demonstrations. To address this problem, this paper proposes a simple and effective solution by incorporating calibrated contrastive representative learning into visual AIL framework. Specifically, we present an image encoder in visual AIL, utilizing a combination of unsupervised and supervised contrastive learning to extract valuable features from visual states. Based on the fact that the improved agent often produces demonstrations of varying quality, we propose to calibrate the contrastive loss by treating each agent demonstrations as a mixed sample. The incorporation of contrastive learning can be jointly optimized with the AIL framework, without modifying the architecture or incurring significant computational costs. Experimental results on DMControl Suite demonstrate our proposed method is sample efficient and can outperform other compared methods from different aspects. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.06763 [pdf, other]

Optimally Blending Honeypots into Production Networks: Hardness and Algorithms

Authors: Md Mahabub Uz Zaman, Liangde Tao, Mark Maldonado, Chang Liu, Ahmed Sunny, Shouhuai Xu, Lin Chen

Abstract: Honeypot is an important cyber defense technique that can expose attackers new attacks. However, the effectiveness of honeypots has not been systematically investigated, beyond the rule of thumb that their effectiveness depends on how they are deployed. In this paper, we initiate a systematic study on characterizing the cybersecurity effectiveness of a new paradigm of deploying honeypots: blending… ▽ More Honeypot is an important cyber defense technique that can expose attackers new attacks. However, the effectiveness of honeypots has not been systematically investigated, beyond the rule of thumb that their effectiveness depends on how they are deployed. In this paper, we initiate a systematic study on characterizing the cybersecurity effectiveness of a new paradigm of deploying honeypots: blending honeypot computers (or IP addresses) into production computers. This leads to the following Honeypot Deployment (HD) problem, How should the defender blend honeypot computers into production computers to maximize the utility in forcing attackers to expose their new attacks while minimizing the loss to the defender in terms of the digital assets stored in the compromised production computers? We formalize HD as a combinatorial optimization problem, prove its NP hardness, provide a near optimal algorithm (i.e., polynomial time approximation scheme). We also conduct simulations to show the impact of attacker capabilities. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: published in 5th International Conference on Science of Cyber Security - SciSec 2023

arXiv:2312.02199 [pdf, other]

USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery

Authors: Jeremy Irvin, Lucas Tao, Joanne Zhou, Yuntao Ma, Langston Nashold, Benjamin Liu, Andrew Y. Ng

Abstract: Large, self-supervised vision models have led to substantial advancements for automatically interpreting natural images. Recent works have begun tailoring these methods to remote sensing data which has rich structure with multi-sensor, multi-spectral, and temporal information providing massive amounts of self-labeled data that can be used for self-supervised pre-training. In this work, we develop… ▽ More Large, self-supervised vision models have led to substantial advancements for automatically interpreting natural images. Recent works have begun tailoring these methods to remote sensing data which has rich structure with multi-sensor, multi-spectral, and temporal information providing massive amounts of self-labeled data that can be used for self-supervised pre-training. In this work, we develop a new encoder architecture called USat that can input multi-spectral data from multiple sensors for self-supervised pre-training. USat is a vision transformer with modified patch projection layers and positional encodings to model spectral bands with varying spatial scales from multiple sensors. We integrate USat into a Masked Autoencoder (MAE) self-supervised pre-training procedure and find that a pre-trained USat outperforms state-of-the-art self-supervised MAE models trained on remote sensing data on multiple remote sensing benchmark datasets (up to 8%) and leads to improvements in low data regimes (up to 7%). Code and pre-trained weights are available at https://github.com/stanfordmlgroup/USat . △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2312.01682 [pdf, other]

ResEnsemble-DDPM: Residual Denoising Diffusion Probabilistic Models for Ensemble Learning

Authors: Shi Zhenning, Dong Changsheng, Xie Xueshuo, Pan Bin, He Along, Li Tao

Abstract: Nowadays, denoising diffusion probabilistic models have been adapted for many image segmentation tasks. However, existing end-to-end models have already demonstrated remarkable capabilities. Rather than using denoising diffusion probabilistic models alone, integrating the abilities of both denoising diffusion probabilistic models and existing end-to-end models can better improve the performance of… ▽ More Nowadays, denoising diffusion probabilistic models have been adapted for many image segmentation tasks. However, existing end-to-end models have already demonstrated remarkable capabilities. Rather than using denoising diffusion probabilistic models alone, integrating the abilities of both denoising diffusion probabilistic models and existing end-to-end models can better improve the performance of image segmentation. Based on this, we implicitly introduce residual term into the diffusion process and propose ResEnsemble-DDPM, which seamlessly integrates the diffusion model and the end-to-end model through ensemble learning. The output distributions of these two models are strictly symmetric with respect to the ground truth distribution, allowing us to integrate the two models by reducing the residual term. Experimental results demonstrate that our ResEnsemble-DDPM can further improve the capabilities of existing models. Furthermore, its ensemble learning strategy can be generalized to other downstream tasks in image generation and get strong competitiveness. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2311.14900

arXiv:2310.07837 [pdf, other]

Measuring Feature Sparsity in Language Models

Authors: Mingyang Deng, Lucas Tao, Joe Benton

Abstract: Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We… ▽ More Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We show our metrics can predict the level of sparsity on synthetic sparse linear activations, and can distinguish between sparse linear data and several other distributions. We use our metrics to measure levels of sparsity in several language models. We find evidence that language model activations can be accurately modelled by sparse linear combinations of features, significantly more so than control datasets. We also show that model activations appear to be sparsest in the first and final layers. △ Less

Submitted 13 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.04724 [pdf, other]

Activate and Reject: Towards Safe Domain Generalization under Category Shift

Authors: Chaoqi Chen, Luyao Tang, Leitian Tao, Hong-Yu Zhou, Yue Huang, Xiaoguang Han, Yizhou Yu

Abstract: Albeit the notable performance on in-domain test points, it is non-trivial for deep neural networks to attain satisfactory accuracy when deploying in the open world, where novel domains and object classes often occur. In this paper, we study a practical problem of Domain Generalization under Category Shift (DGCS), which aims to simultaneously detect unknown-class samples and classify known-class s… ▽ More Albeit the notable performance on in-domain test points, it is non-trivial for deep neural networks to attain satisfactory accuracy when deploying in the open world, where novel domains and object classes often occur. In this paper, we study a practical problem of Domain Generalization under Category Shift (DGCS), which aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains. Compared to prior DG works, we face two new challenges: 1) how to learn the concept of ``unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments for safe model deployment. To this end, we propose a novel Activate and Reject (ART) framework to reshape the model's decision boundary to accommodate unknown classes and conduct post hoc modification to further discriminate known and unknown classes using unlabeled test data. Specifically, during training, we promote the response to the unknown by optimizing the unknown probability and then smoothing the overall output to mitigate the overconfidence issue. At test time, we introduce a step-wise online adaptation method that predicts the label by virtue of the cross-domain nearest neighbor and class prototype information without updating the network's parameters or using threshold-based mechanisms. Experiments reveal that ART consistently improves the generalization capability of deep networks on different vision tasks. For image classification, ART improves the H-score by 6.1% on average compared to the previous best method. For object detection and semantic segmentation, we establish new benchmarks and achieve competitive performance. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: ICCV 2023

arXiv:2309.07350 [pdf]

Curriculum-based Sensing Reduction in Simulation to Real-World Transfer for In-hand Manipulation

Authors: Lingfeng Tao, Jiucai Zhang, Qiaojie Zheng, Xiaoli Zhang

Abstract: Simulation to Real-World Transfer allows affordable and fast training of learning-based robots for manipulation tasks using Deep Reinforcement Learning methods. Currently, Sim2Real uses Asymmetric Actor-Critic approaches to reduce the rich idealized features in simulation to the accessible ones in the real world. However, the feature reduction from the simulation to the real world is conducted thr… ▽ More Simulation to Real-World Transfer allows affordable and fast training of learning-based robots for manipulation tasks using Deep Reinforcement Learning methods. Currently, Sim2Real uses Asymmetric Actor-Critic approaches to reduce the rich idealized features in simulation to the accessible ones in the real world. However, the feature reduction from the simulation to the real world is conducted through an empirically defined one-step curtail. Small feature reduction does not sufficiently remove the actor's features, which may still cause difficulty setting up the physical system, while large feature reduction may cause difficulty and inefficiency in training. To address this issue, we proposed Curriculum-based Sensing Reduction to enable the actor to start with the same rich feature space as the critic and then get rid of the hard-to-extract features step-by-step for higher training performance and better adaptation for real-world feature space. The reduced features are replaced with random signals from a Deep Random Generator to remove the dependency between the output and the removed features and avoid creating new dependencies. The methods are evaluated on the Allegro robot hand in a real-world in-hand manipulation task. The results show that our methods have faster training and higher task performance than baselines and can solve real-world tasks when selected tactile features are reduced. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2309.07349 [pdf]

Stable In-hand Manipulation with Finger Specific Multi-agent Shadow Reward

Authors: Lingfeng Tao, Jiucai Zhang, Xiaoli Zhang

Abstract: Deep Reinforcement Learning has shown its capability to solve the high degrees of freedom in control and the complex interaction with the object in the multi-finger dexterous in-hand manipulation tasks. Current DRL approaches prefer sparse rewards to dense rewards for the ease of training but lack behavior constraints during the manipulation process, leading to aggressive and unstable policies tha… ▽ More Deep Reinforcement Learning has shown its capability to solve the high degrees of freedom in control and the complex interaction with the object in the multi-finger dexterous in-hand manipulation tasks. Current DRL approaches prefer sparse rewards to dense rewards for the ease of training but lack behavior constraints during the manipulation process, leading to aggressive and unstable policies that are insufficient for safety-critical in-hand manipulation tasks. Dense rewards can regulate the policy to learn stable manipulation behaviors with continuous reward constraints but are hard to empirically define and slow to converge optimally. This work proposes the Finger-specific Multi-agent Shadow Reward (FMSR) method to determine the stable manipulation constraints in the form of dense reward based on the state-action occupancy measure, a general utility of DRL that is approximated during the learning process. Information Sharing (IS) across neighboring agents enables consensus training to accelerate the convergence. The methods are evaluated in two in-hand manipulation tasks on the Shadow Hand. The results show FMSR+IS converges faster in training, achieving a higher task success rate and better manipulation stability than conventional dense reward. The comparison indicates FMSR+IS achieves a comparable success rate even with the behavior constraint but much better manipulation stability than the policy trained with a sparse reward. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.12050 [pdf, other]

Aligning Language Models with Offline Learning from Human Feedback

Authors: Jian Hu, Li Tao, June Yang, Chandler Zhou

Abstract: Learning from human preferences is crucial for language models (LMs) to effectively cater to human needs and societal values. Previous research has made notable progress by leveraging human feedback to follow instructions. However, these approaches rely primarily on online learning techniques like Proximal Policy Optimization (PPO), which have been proven unstable and challenging to tune for langu… ▽ More Learning from human preferences is crucial for language models (LMs) to effectively cater to human needs and societal values. Previous research has made notable progress by leveraging human feedback to follow instructions. However, these approaches rely primarily on online learning techniques like Proximal Policy Optimization (PPO), which have been proven unstable and challenging to tune for language models. Moreover, PPO requires complex distributed system implementation, hindering the efficiency of large-scale distributed training. In this study, we propose an offline learning from human feedback framework to align LMs without interacting with environments. Specifically, we explore filtering alignment (FA), reward-weighted regression (RWR), and conditional alignment (CA) to align language models to human preferences. By employing a loss function similar to supervised fine-tuning, our methods ensure more stable model training than PPO with a simple machine learning system~(MLSys) and much fewer (around 9\%) computing resources. Experimental results demonstrate that conditional alignment outperforms other offline alignment methods and is comparable to PPO. △ Less

Submitted 9 December, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.11838 [pdf, other]

A Benchmark Study on Calibration

Authors: Linwei Tao, Younan Zhu, Haolan Guo, Minjing Dong, Chang Xu

Abstract: Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through the use of specific loss functions, data preprocessing and training frameworks. Yet, investigations into calibration properties… ▽ More Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through the use of specific loss functions, data preprocessing and training frameworks. Yet, investigations into calibration properties have been somewhat overlooked. Our study leverages the Neural Architecture Search (NAS) search space, offering an exhaustive model architecture space for thorough calibration properties exploration. We specifically create a model calibration dataset. This dataset evaluates 90 bin-based and 12 additional calibration measurements across 117,702 unique neural networks within the widely employed NATS-Bench search space. Our analysis aims to answer several longstanding questions in the field, using our proposed dataset: (i) Can model calibration be generalized across different datasets? (ii) Can robustness be used as a calibration measurement? (iii) How reliable are calibration metrics? (iv) Does a post-hoc calibration method affect all models uniformly? (v) How does calibration interact with accuracy? (vi) What is the impact of bin size on calibration measurement? (vii) Which architectural designs are beneficial for calibration? Additionally, our study bridges an existing gap by exploring calibration within NAS. By providing this dataset, we enable further research into NAS calibration. As far as we are aware, our research represents the first large-scale investigation into calibration properties and the premier study of calibration issues within NAS. The project page can be found at https://www.taolinwei.com/calibration-study △ Less

Submitted 22 March, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: ICLR 2024 poster

arXiv:2308.10487 [pdf, other]

Deciphering Raw Data in Neuro-Symbolic Learning with Provable Guarantees

Authors: Lue Tao, Yu-Xuan Huang, Wang-Zhou Dai, Yuan Jiang

Abstract: Neuro-symbolic hybrid systems are promising for integrating machine learning and symbolic reasoning, where perception models are facilitated with information inferred from a symbolic knowledge base through logical reasoning. Despite empirical evidence showing the ability of hybrid systems to learn accurate perception models, the theoretical understanding of learnability is still lacking. Hence, it… ▽ More Neuro-symbolic hybrid systems are promising for integrating machine learning and symbolic reasoning, where perception models are facilitated with information inferred from a symbolic knowledge base through logical reasoning. Despite empirical evidence showing the ability of hybrid systems to learn accurate perception models, the theoretical understanding of learnability is still lacking. Hence, it remains unclear why a hybrid system succeeds for a specific task and when it may fail given a different knowledge base. In this paper, we introduce a novel way of characterising supervision signals from a knowledge base, and establish a criterion for determining the knowledge's efficacy in facilitating successful learning. This, for the first time, allows us to address the two questions above by inspecting the knowledge base under investigation. Our analysis suggests that many knowledge bases satisfy the criterion, thus enabling effective learning, while some fail to satisfy it, indicating potential failures. Comprehensive experiments confirm the utility of our criterion on benchmark tasks. △ Less

Submitted 23 January, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.02815 [pdf, other]

The changing rule of human bone density with aging based on a novel definition and mensuration of bone density with computed tomography

Authors: Linmi Tao, Ruiyang Liu, Yuanbiao Wang, Yuezhi Zhou, Li Huo, Guilan Hu, Xiangsong Zhang, Zuo-Xiang He

Abstract: Osteoporosis and fragility fractures have emerged as major public health concerns in an aging population. However, measuring age-related changes in bone density using dual-energy X-ray absorptiometry has limited personalized risk assessment due to susceptibility to interference from various factors. In this study, we propose an innovative statistical model of bone pixel distribution in fine-segmen… ▽ More Osteoporosis and fragility fractures have emerged as major public health concerns in an aging population. However, measuring age-related changes in bone density using dual-energy X-ray absorptiometry has limited personalized risk assessment due to susceptibility to interference from various factors. In this study, we propose an innovative statistical model of bone pixel distribution in fine-segmented computed tomography (CT) images, along with a novel approach to measuring bone density based on CT values of bone pixels. Our findings indicate that bone density exhibits a linear decline with age during adulthood between the ages of 39 and 80, with the rate of decline being approximately 1.6 times faster in women than in men. This contradicts the widely accepted notion that bone density starts declining in women at menopause and in men at around 50 years of age. The linearity of age-related changes provides further insights into the dynamics of the aging human body. Consequently, our findings suggest that the definition of osteoporosis by the World Health Organization should be revised to the standard deviation of age-based bone density. Furthermore, these results open up new avenues for research in bone health care and clinical investigation of osteoporosis. △ Less

Submitted 5 August, 2023; originally announced August 2023.

arXiv:2306.11307 [pdf, other]

Transforming Graphs for Enhanced Attribute Clustering: An Innovative Graph Transformer-Based Method

Authors: Shuo Han, Jiacheng Liu, Jiayun Wu, Yinan Chen, Li Tao

Abstract: Graph Representation Learning (GRL) is an influential methodology, enabling a more profound understanding of graph-structured data and aiding graph clustering, a critical task across various domains. The recent incursion of attention mechanisms, originally an artifact of Natural Language Processing (NLP), into the realm of graph learning has spearheaded a notable shift in research trends. Conseque… ▽ More Graph Representation Learning (GRL) is an influential methodology, enabling a more profound understanding of graph-structured data and aiding graph clustering, a critical task across various domains. The recent incursion of attention mechanisms, originally an artifact of Natural Language Processing (NLP), into the realm of graph learning has spearheaded a notable shift in research trends. Consequently, Graph Attention Networks (GATs) and Graph Attention Auto-Encoders have emerged as preferred tools for graph clustering tasks. Yet, these methods primarily employ a local attention mechanism, thereby curbing their capacity to apprehend the intricate global dependencies between nodes within graphs. Addressing these impediments, this study introduces an innovative method known as the Graph Transformer Auto-Encoder for Graph Clustering (GTAGC). By melding the Graph Auto-Encoder with the Graph Transformer, GTAGC is adept at capturing global dependencies between nodes. This integration amplifies the graph representation and surmounts the constraints posed by the local attention mechanism. The architecture of GTAGC encompasses graph embedding, integration of the Graph Transformer within the autoencoder structure, and a clustering component. It strategically alternates between graph embedding and clustering, thereby tailoring the Graph Transformer for clustering tasks, whilst preserving the graph's global structural information. Through extensive experimentation on diverse benchmark datasets, GTAGC has exhibited superior performance against existing state-of-the-art graph clustering methodologies. △ Less

Submitted 12 August, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2305.13665 [pdf, other]

Dual Focal Loss for Calibration

Authors: Linwei Tao, Minjing Dong, Chang Xu

Abstract: The use of deep neural networks in real-world applications require well-calibrated networks with confidence scores that accurately reflect the actual probability. However, it has been found that these networks often provide over-confident predictions, which leads to poor calibration. Recent efforts have sought to address this issue by focal loss to reduce over-confidence, but this approach can als… ▽ More The use of deep neural networks in real-world applications require well-calibrated networks with confidence scores that accurately reflect the actual probability. However, it has been found that these networks often provide over-confident predictions, which leads to poor calibration. Recent efforts have sought to address this issue by focal loss to reduce over-confidence, but this approach can also lead to under-confident predictions. While different variants of focal loss have been explored, it is difficult to find a balance between over-confidence and under-confidence. In our work, we propose a new loss function by focusing on dual logits. Our method not only considers the ground truth logit, but also take into account the highest logit ranked after the ground truth logit. By maximizing the gap between these two logits, our proposed dual focal loss can achieve a better balance between over-confidence and under-confidence. We provide theoretical evidence to support our approach and demonstrate its effectiveness through evaluations on multiple models and datasets, where it achieves state-of-the-art performance. Code is available at https://github.com/Linwei94/DualFocalLoss △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: ICML 2023 Accept

arXiv:2305.08098 [pdf, other]

A Theory of General Difference in Continuous and Discrete Domain

Authors: Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, Jingmao Cui

Abstract: Though a core element of the digital age, numerical difference algorithms struggle with noise susceptibility. This stems from a key disconnect between the infinitesimal quantities in continuous differentiation and the finite intervals in its discrete counterpart. This disconnect violates the fundamental definition of differentiation (Leibniz and Cauchy). To bridge this gap, we build a novel genera… ▽ More Though a core element of the digital age, numerical difference algorithms struggle with noise susceptibility. This stems from a key disconnect between the infinitesimal quantities in continuous differentiation and the finite intervals in its discrete counterpart. This disconnect violates the fundamental definition of differentiation (Leibniz and Cauchy). To bridge this gap, we build a novel general difference (Tao General Difference, TGD). Departing from derivative-by-integration, TGD generalizes differentiation to finite intervals in continuous domains through three key constraints. This allows us to calculate the general difference of a sequence in discrete domain via the continuous step function constructed from the sequence. Two construction methods, the rotational construction and the orthogonal construction, are proposed to construct the operators of TGD. The construction TGD operators take same convolution mode in calculation for continuous functions, discrete sequences, and arrays across any dimension. Our analysis with example operations showcases TGD's capability in both continuous and discrete domains, paving the way for accurate and noise-resistant differentiation in the digital era. △ Less

Submitted 25 January, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

arXiv:2303.17482 [pdf]

Three-way causal attribute partial order structure analysis

Authors: Xue Zaifa, Lu Huibin, Zhang Tao, Li Tao, Lu Xin

Abstract: As an emerging concept cognitive learning model, partial order formal structure analysis (POFSA) has been widely used in the field of knowledge processing. In this paper, we propose the method named three-way causal attribute partial order structure (3WCAPOS) to evolve the POFSA from set coverage to causal coverage in order to increase the interpretability and classification performance of the mod… ▽ More As an emerging concept cognitive learning model, partial order formal structure analysis (POFSA) has been widely used in the field of knowledge processing. In this paper, we propose the method named three-way causal attribute partial order structure (3WCAPOS) to evolve the POFSA from set coverage to causal coverage in order to increase the interpretability and classification performance of the model. First, the concept of causal factor (CF) is proposed to evaluate the causal correlation between attributes and decision attributes in the formal decision context. Then, combining CF with attribute partial order structure, the concept of causal attribute partial order structure is defined and makes set coverage evolve into causal coverage. Finally, combined with the idea of three-way decision, 3WCAPOS is formed, which makes the purity of nodes in the structure clearer and the changes between levels more obviously. In addition, the experiments are carried out from the classification ability and the interpretability of the structure through the six datasets. Through these experiments, it is concluded the accuracy of 3WCAPOS is improved by 1% - 9% compared with classification and regression tree, and more interpretable and the processing of knowledge is more reasonable compared with attribute partial order structure. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.02966 [pdf, other]

Non-Parametric Outlier Synthesis

Authors: Leitian Tao, Xuefeng Du, Xiaojin Zhu, Yixuan Li

Abstract: Out-of-distribution (OOD) detection is indispensable for safely deploying machine learning models in the wild. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Recent work on outlier synthesis modeled the feature space as parametric Gaussian distribution, a strong and restrictive assumption that… ▽ More Out-of-distribution (OOD) detection is indispensable for safely deploying machine learning models in the wild. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Recent work on outlier synthesis modeled the feature space as parametric Gaussian distribution, a strong and restrictive assumption that might not hold in reality. In this paper, we propose a novel framework, Non-Parametric Outlier Synthesis (NPOS), which generates artificial OOD training data and facilitates learning a reliable decision boundary between ID and OOD data. Importantly, our proposed synthesis approach does not make any distributional assumption on the ID embeddings, thereby offering strong flexibility and generality. We show that our synthesis approach can be mathematically interpreted as a rejection sampling framework. Extensive experiments show that NPOS can achieve superior OOD detection performance, outperforming the competitive rivals by a significant margin. Code is publicly available at https://github.com/deeplearning-wisc/npos. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: ICLR 2023

arXiv:2302.06245 [pdf, other]

Calibrating a Deep Neural Network with Its Predecessors

Authors: Linwei Tao, Minjing Dong, Daochang Liu, Changming Sun, Chang Xu

Abstract: Confidence calibration - the process to calibrate the output probability distribution of neural networks - is essential for safety-critical applications of such networks. Recent works verify the link between mis-calibration and overfitting. However, early stopping, as a well-known technique to mitigate overfitting, fails to calibrate networks. In this work, we study the limitions of early stopping… ▽ More Confidence calibration - the process to calibrate the output probability distribution of neural networks - is essential for safety-critical applications of such networks. Recent works verify the link between mis-calibration and overfitting. However, early stopping, as a well-known technique to mitigate overfitting, fails to calibrate networks. In this work, we study the limitions of early stopping and comprehensively analyze the overfitting problem of a network considering each individual block. We then propose a novel regularization method, predecessor combination search (PCS), to improve calibration by searching a combination of best-fitting block predecessors, where block predecessors are the corresponding network blocks with weight parameters from earlier training stages. PCS achieves the state-of-the-art calibration performance on multiple datasets and architectures. In addition, PCS improves model robustness under dataset distribution shift. △ Less

Submitted 23 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: IJCAI 2023 Accept

arXiv:2212.07751 [pdf, other]

Combating Uncertainty and Class Imbalance in Facial Expression Recognition

Authors: Jiaxiang Fan, Jian Zhou, Xiaoyu Deng, Huabin Wang, Liang Tao, Hon Keung Kwan

Abstract: Recognition of facial expression is a challenge when it comes to computer vision. The primary reasons are class imbalance due to data collection and uncertainty due to inherent noise such as fuzzy facial expressions and inconsistent labels. However, current research has focused either on the problem of class imbalance or on the problem of uncertainty, ignoring the intersection of how to address th… ▽ More Recognition of facial expression is a challenge when it comes to computer vision. The primary reasons are class imbalance due to data collection and uncertainty due to inherent noise such as fuzzy facial expressions and inconsistent labels. However, current research has focused either on the problem of class imbalance or on the problem of uncertainty, ignoring the intersection of how to address these two problems. Therefore, in this paper, we propose a framework based on Resnet and Attention to solve the above problems. We design weight for each class. Through the penalty mechanism, our model will pay more attention to the learning of small samples during training, and the resulting decrease in model accuracy can be improved by a Convolutional Block Attention Module (CBAM). Meanwhile, our backbone network will also learn an uncertain feature for each sample. By mixing uncertain features between samples, the model can better learn those features that can be used for classification, thus suppressing uncertainty. Experiments show that our method surpasses most basic methods in terms of accuracy on facial expression data sets (e.g., AffectNet, RAF-DB), and it also solves the problem of class imbalance well. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2212.07163 [pdf, other]

Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation

Authors: Yinhao Xu, Jian Zhou, Liang Tao, Hon Keung Kwan

Abstract: Recently studies on time-domain audio separation networks (TasNets) have made a great stride in speech separation. One of the most representative TasNets is a network with a dual-path segmentation approach. However, the original model called DPRNN used a fixed feature dimension and unchanged segment size throughout all layers of the network. In this paper, we propose a multi-scale feature fusion t… ▽ More Recently studies on time-domain audio separation networks (TasNets) have made a great stride in speech separation. One of the most representative TasNets is a network with a dual-path segmentation approach. However, the original model called DPRNN used a fixed feature dimension and unchanged segment size throughout all layers of the network. In this paper, we propose a multi-scale feature fusion transformer network (MSFFT-Net) based on the conventional dual-path structure for single-channel speech separation. Unlike the conventional dual-path structure where only one processing path exists, adopting several iterative blocks with alternative intra-chunk and inter-chunk operations to capture local and global context information, the proposed MSFFT-Net has multiple parallel processing paths where the feature information can be exchanged between multiple parallel processing paths. Experiments show that our proposed networks based on multi-scale feature fusion structure have achieved better results than the original dual-path model on the benchmark dataset-WSJ0-2mix, where the SI-SNRi score of MSFFT-3P is 20.7dB (1.47% improvement), and MSFFT-2P is 21.0dB (3.45% improvement), which achieves SOTA on WSJ0-2mix without any data augmentation method. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2211.06590 [pdf, other]

Significant Ties Graph Neural Networks for Continuous-Time Temporal Networks Modeling

Authors: Jiayun Wu, Tao Jia, Yansong Wang, Li Tao

Abstract: Temporal networks are suitable for modeling complex evolving systems. It has a wide range of applications, such as social network analysis, recommender systems, and epidemiology. Recently, modeling such dynamic systems has drawn great attention in many domains. However, most existing approaches resort to taking discrete snapshots of the temporal networks and modeling all events with equal importan… ▽ More Temporal networks are suitable for modeling complex evolving systems. It has a wide range of applications, such as social network analysis, recommender systems, and epidemiology. Recently, modeling such dynamic systems has drawn great attention in many domains. However, most existing approaches resort to taking discrete snapshots of the temporal networks and modeling all events with equal importance. This paper proposes Significant Ties Graph Neural Networks (STGNN), a novel framework that captures and describes significant ties. To better model the diversity of interactions, STGNN introduces a novel aggregation mechanism to organize the most significant historical neighbors' information and adaptively obtain the significance of node pairs. Experimental results on four real networks demonstrate the effectiveness of the proposed framework. △ Less

Submitted 12 November, 2022; originally announced November 2022.

Comments: 9 pages, 5 figures

arXiv:2210.16423 [pdf]

Transferability-based Chain Motion Mapping from Humans to Humanoids for Teleoperation

Authors: Matthew Stanley, Yunsik Jung, Michael Bowman, Lingfeng Tao, Xiaoli Zhang

Abstract: Although data-driven motion mapping methods are promising to allow intuitive robot control and teleoperation that generate human-like robot movement, they normally require tedious pair-wise training for each specific human and robot pair. This paper proposes a transferability-based mapping scheme to allow new robot and human input systems to leverage the mapping of existing trained pairs to form a… ▽ More Although data-driven motion mapping methods are promising to allow intuitive robot control and teleoperation that generate human-like robot movement, they normally require tedious pair-wise training for each specific human and robot pair. This paper proposes a transferability-based mapping scheme to allow new robot and human input systems to leverage the mapping of existing trained pairs to form a mapping transfer chain, which will reduce the number of new pair-specific mappings that need to be generated. The first part of the mapping schematic is the development of a Synergy Mapping via Dual-Autoencoder (SyDa) method. This method uses the latent features from two autoencoders to extract the common synergy of the two agents. Secondly, a transferability metric is created that approximates how well the mapping between a pair of agents will perform compared to another pair before creating the motion mapping models. Thus, it can guide the formation of an optimal mapping chain for the new human-robot pair. Experiments with human subjects and a Pepper robot demonstrated 1) The SyDa method improves the accuracy and generalizability of the pair mappings, 2) the SyDa method allows for bidirectional mapping that does not prioritize the direction of mapping motion, and 3) the transferability metric measures how compatible two agents are for accurate teleoperation. The combination of the SyDa method and transferability metric creates generalizable and accurate mapping need to create the transfer mapping chain. △ Less

Submitted 28 October, 2022; originally announced October 2022.

arXiv:2210.05767 [pdf]

doi 10.1109/ICRA48891.2023.10160909

A Multi-Agent Approach for Adaptive Finger Cooperation in Learning-based In-Hand Manipulation

Authors: Lingfeng Tao, Jiucai Zhang, Michael Bowman, Xiaoli Zhang

Abstract: In-hand manipulation is challenging for a multi-finger robotic hand due to its high degrees of freedom and the complex interaction with the object. To enable in-hand manipulation, existing deep reinforcement learning based approaches mainly focus on training a single robot-structure-specific policy through the centralized learning mechanism, lacking adaptability to changes like robot malfunction.… ▽ More In-hand manipulation is challenging for a multi-finger robotic hand due to its high degrees of freedom and the complex interaction with the object. To enable in-hand manipulation, existing deep reinforcement learning based approaches mainly focus on training a single robot-structure-specific policy through the centralized learning mechanism, lacking adaptability to changes like robot malfunction. To solve this limitation, this work treats each finger as an individual agent and trains multiple agents to control their assigned fingers to complete the in-hand manipulation task cooperatively. We propose the Multi-Agent Global-Observation Critic and Local-Observation Actor (MAGCLA) method, where the critic can observe all agents' actions globally, and the actor only locally observes its neighbors' actions. Besides, conventional individual experience replay may cause unstable cooperation due to the asynchronous performance increment of each agent, which is critical for in-hand manipulation tasks. To solve this issue, we propose the Synchronized Hindsight Experience Replay (SHER) method to synchronize and efficiently reuse the replayed experience across all agents. The methods are evaluated in two in-hand manipulation tasks on the Shadow dexterous hand. The results show that SHER helps MAGCLA achieve comparable learning efficiency to a single policy, and the MAGCLA approach is more generalizable in different tasks. The trained policies have higher adaptability in the robot malfunction test compared to the baseline multi-agent and single-agent approaches. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Submitted to ICRA 2023

arXiv:2210.05406 [pdf, other]

Code Librarian: A Software Package Recommendation System

Authors: Lili Tao, Alexandru-Petre Cazan, Senad Ibraimoski, Sean Moran

Abstract: The use of packaged libraries can significantly shorten the software development cycle by improving the quality and readability of code. In this paper, we present a recommendation engine called Librarian for open source libraries. A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functiona… ▽ More The use of packaged libraries can significantly shorten the software development cycle by improving the quality and readability of code. In this paper, we present a recommendation engine called Librarian for open source libraries. A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code. We apply the state-of-the-art CodeBERT-based model for analysing the context of the source code to deliver relevant library recommendations to users. △ Less

Submitted 7 February, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

arXiv:2206.08802 [pdf, other]

Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets

Authors: Hongxin Wei, Lue Tao, Renchunzi Xie, Lei Feng, Bo An

Abstract: Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance. Recent studies found that directly training with out-of-distribution data (i.e., open-set samples) in a semi-supervised manner would harm the generalization performance. In this work, we theoretically show that out-of-distribution data can still be leveraged to augment the minority classes f… ▽ More Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance. Recent studies found that directly training with out-of-distribution data (i.e., open-set samples) in a semi-supervised manner would harm the generalization performance. In this work, we theoretically show that out-of-distribution data can still be leveraged to augment the minority classes from a Bayesian perspective. Based on this motivation, we propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset. For each open-set instance, the label is sampled from our pre-defined distribution that is complementary to the distribution of original class priors. We empirically show that Open-sampling not only re-balances the class priors but also encourages the neural network to learn separable representations. Extensive experiments demonstrate that our proposed method significantly outperforms existing data re-balancing methods and can boost the performance of existing state-of-the-art methods. △ Less

Submitted 5 July, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted by ICML 2022

arXiv:2205.13561 [pdf]

Physics-Guided Hierarchical Reward Mechanism for Learning-Based Robotic Grasping

Authors: Yunsik Jung, Lingfeng Tao, Michael Bowman, Jiucai Zhang, Xiaoli Zhang

Abstract: Learning-based grasping can afford real-time grasp motion planning of multi-fingered robotics hands thanks to its high computational efficiency. However, learning-based methods are required to explore large search spaces during the learning process. The search space causes low learning efficiency, which has been the main barrier to its practical adoption. In addition, the trained policy lacks a ge… ▽ More Learning-based grasping can afford real-time grasp motion planning of multi-fingered robotics hands thanks to its high computational efficiency. However, learning-based methods are required to explore large search spaces during the learning process. The search space causes low learning efficiency, which has been the main barrier to its practical adoption. In addition, the trained policy lacks a generalizable outcome unless objects are identical to the trained objects. In this work, we develop a novel Physics-Guided Deep Reinforcement Learning with a Hierarchical Reward Mechanism to improve learning efficiency and generalizability for learning-based autonomous grasping. Unlike conventional observation-based grasp learning, physics-informed metrics are utilized to convey correlations between features associated with hand structures and objects to improve learning efficiency and outcomes. Further, the hierarchical reward mechanism enables the robot to learn prioritized components of the grasping tasks. Our method is validated in robotic grasping tasks with a 3-finger MICO robot arm. The results show that our method outperformed the standard Deep Reinforcement Learning methods in various robotic grasping tasks. △ Less

Submitted 23 July, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2205.13441 [pdf]

doi 10.1007/s10846-022-01680-7

Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum

Authors: Lingfeng Tao, Jiucai Zhang, Xiaoli Zhang

Abstract: Dexterous manipulation tasks usually have multiple objectives, and the priorities of these objectives may vary at different phases of a manipulation task. Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method. To solve this problem, we develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to le… ▽ More Dexterous manipulation tasks usually have multiple objectives, and the priorities of these objectives may vary at different phases of a manipulation task. Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method. To solve this problem, we develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to learn manipulation tasks with multiple prioritized objectives. The AHRM can determine the objective priorities during the learning process and update the reward hierarchy to adapt to the changing objective priorities at different phases. The proposed method is validated in a multi-objective manipulation task with a JACO robot arm in which the robot needs to manipulate a target with obstacles surrounded. The simulation and physical experiment results show that the proposed method improved robot learning in task performance and learning efficiency. △ Less

Submitted 28 July, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Accepted by the Journal of Intelligent & Robotic Systems

arXiv:2205.08585 [pdf, other]

CV4Code: Sourcecode Understanding via Visual Code Representations

Authors: Ruibo Shi, Lili Tao, Rohan Saphal, Fran Silavong, Sean J. Moran

Abstract: We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a two-dimensional image, which naturally encodes the context and retains the underlying structural information through an explicit spatial representation. To codify snippets as… ▽ More We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a two-dimensional image, which naturally encodes the context and retains the underlying structural information through an explicit spatial representation. To codify snippets as images, we propose an ASCII codepoint-based image representation that facilitates fast generation of sourcecode images and eliminates redundancy in the encoding that would arise from an RGB pixel representation. Furthermore, as sourcecode is treated as images, neither lexical analysis (tokenisation) nor syntax tree parsing is required, which makes the proposed method agnostic to any particular programming language and lightweight from the application pipeline point of view. CV4Code can even featurise syntactically incorrect code which is not possible from methods that depend on the Abstract Syntax Tree (AST). We demonstrate the effectiveness of CV4Code by learning Convolutional and Transformer networks to predict the functional task, i.e. the problem it solves, of the source code directly from its two-dimensional representation, and using an embedding from its latent space to derive a similarity score of two code snippets in a retrieval setup. Experimental results show that our approach achieves state-of-the-art performance in comparison to other methods with the same task and data configurations. For the first time we show the benefits of treating sourcecode understanding as a form of image processing task. △ Less

Submitted 11 May, 2022; originally announced May 2022.

arXiv:2205.00771 [pdf, other]

Local Differential Privacy Meets Computational Social Choice -- Resilience under Voter Deletion

Authors: Liangde Tao, Lin Chen, Lei Xu, Weidong Shi

Abstract: The resilience of a voting system has been a central topic in computational social choice. Many voting rules, like plurality, are shown to be vulnerable as the attacker can target specific voters to manipulate the result. What if a local differential privacy (LDP) mechanism is adopted such that the true preference of a voter is never revealed in pre-election polls? In this case, the attacker can o… ▽ More The resilience of a voting system has been a central topic in computational social choice. Many voting rules, like plurality, are shown to be vulnerable as the attacker can target specific voters to manipulate the result. What if a local differential privacy (LDP) mechanism is adopted such that the true preference of a voter is never revealed in pre-election polls? In this case, the attacker can only infer stochastic information about a voter's true preference, and this may cause the manipulation of the electoral result significantly harder. The goal of this paper is to provide a quantitative study on the effect of adopting LDP mechanisms on a voting system. We introduce the metric PoLDP (power of LDP) that quantitatively measures the difference between the attacker's manipulation cost under LDP mechanisms and that without LDP mechanisms. The larger PoLDP is, the more robustness LDP mechanisms can add to a voting system. We give a full characterization of PoLDP for the voting system with plurality rule and provide general guidance towards the application of LDP mechanisms. △ Less

Submitted 3 May, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

arXiv:2203.14556 [pdf, other]

Pyramid Feature Alignment Network for Video Deblurring

Authors: Leitian Tao, Zhenzhong Chen

Abstract: Video deblurring remains a challenging task due to various causes of blurring. Traditional methods have considered how to utilize neighboring frames by the single-scale alignment for restoration. However, they typically suffer from misalignment caused by severe blur. In this work, we aim to better utilize neighboring frames with high efficient feature alignment. We propose a Pyramid Feature Alignm… ▽ More Video deblurring remains a challenging task due to various causes of blurring. Traditional methods have considered how to utilize neighboring frames by the single-scale alignment for restoration. However, they typically suffer from misalignment caused by severe blur. In this work, we aim to better utilize neighboring frames with high efficient feature alignment. We propose a Pyramid Feature Alignment Network (PFAN) for video deblurring. First, the multi-scale feature of blurry frames is extracted with the strategy of Structure-to-Detail Downsampling (SDD) before alignment. This downsampling strategy makes the edges sharper, which is helpful for alignment. Then we align the feature at each scale and reconstruct the image at the corresponding scale. This strategy effectively supervises the alignment at each scale, overcoming the problem of propagated errors from the above scales at the alignment stage. To better handle the challenges of complex and large motions, instead of aligning features at each scale separately, lower-scale motion information is used to guide the higher-scale motion estimation. Accordingly, a Cascade Guided Deformable Alignment (CGDA) is proposed to integrate coarse motion into deformable convolution for finer and more accurate alignment. As demonstrated in extensive experiments, our proposed PFAN achieves superior performance with competitive speed compared to the state-of-the-art methods. △ Less

Submitted 28 March, 2022; originally announced March 2022.

arXiv:2201.13329 [pdf, other]

Can Adversarial Training Be Manipulated By Non-Robust Features?

Authors: Lue Tao, Lei Feng, Hongxin Wei, Jinfeng Yi, Sheng-Jun Huang, Songcan Chen

Abstract: Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel threat model named stability attack, which aims to hinder robust availability by slightly manipulating the training data. Under this threat, we show that advers… ▽ More Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel threat model named stability attack, which aims to hinder robust availability by slightly manipulating the training data. Under this threat, we show that adversarial training using a conventional defense budget $ε$ provably fails to provide test robustness in a simple statistical setting, where the non-robust features of the training data can be reinforced by $ε$-bounded perturbation. Further, we analyze the necessity of enlarging the defense budget to counter stability attacks. Finally, comprehensive experiments demonstrate that stability attacks are harmful on benchmark datasets, and thus the adaptive defense is necessary to maintain robustness. Our code is available at https://github.com/TLMichael/Hypocritical-Perturbation. △ Less

Submitted 8 October, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: NeurIPS 2022

arXiv:2201.03335 [pdf, other]

DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Authors: Ningyu Zhang, Xin Xu, Liankuan Tao, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Shumin Deng, Peng Wang, Wen Zhang, Zhenru Zhang, Chuanqi Tan, Qiang Chen, Feiyu Xiong, Fei Huang, Guozhou Zheng, Huajun Chen

Abstract: We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to cus… ▽ More We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementation for different tasks and scenarios but also organizes all components by consistent frameworks to maintain sufficient modularity and extensibility. We release the source code at GitHub in https://github.com/zjunlp/DeepKE with Google Colab tutorials and comprehensive documents for beginners. Besides, we present an online system in http://deepke.openkg.cn/EN/re_doc_show.html for real-time extraction of various tasks, and a demo video. △ Less

Submitted 18 September, 2023; v1 submitted 10 January, 2022; originally announced January 2022.

Comments: Accepted by EMNLP 2022 System Demonstrations and the project website is http://deepke.zjukg.cn/

arXiv:2111.04060 [pdf, other]

Are we ready for a new paradigm shift? A Survey on Visual Deep MLP

Authors: Ruiyang Liu, Yinghui Li, Linmi Tao, Dun Liang, Hai-Tao Zheng

Abstract: Recently, the proposed deep MLP models have stirred up a lot of interest in the vision community. Historically, the availability of larger datasets combined with increased computing capacity leads to paradigm shifts. This review paper provides detailed discussions on whether MLP can be a new paradigm for computer vision. We compare the intrinsic connections and differences between convolution, sel… ▽ More Recently, the proposed deep MLP models have stirred up a lot of interest in the vision community. Historically, the availability of larger datasets combined with increased computing capacity leads to paradigm shifts. This review paper provides detailed discussions on whether MLP can be a new paradigm for computer vision. We compare the intrinsic connections and differences between convolution, self-attention mechanism, and Token-mixing MLP in detail. Advantages and limitations of Token-mixing MLP are provided, followed by careful analysis of recent MLP-like variants, from module design to network architecture, and their applications. In the GPU era, the locally and globally weighted summations are the current mainstreams, represented by the convolution and self-attention mechanism, as well as MLP. We suggest the further development of paradigm to be considered alongside the next-generation computing devices. △ Less

Submitted 25 April, 2022; v1 submitted 7 November, 2021; originally announced November 2021.

Comments: With the development of MLP, the survey has been updated to the latest version in April

arXiv:2111.01430 [pdf, other]

CycleGAN with Dual Adversarial Loss for Bone-Conducted Speech Enhancement

Authors: Qing Pan, Teng Gao, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan

Abstract: Compared with air-conducted speech, bone-conducted speech has the unique advantage of shielding background noise. Enhancement of bone-conducted speech helps to improve its quality and intelligibility. In this paper, a novel CycleGAN with dual adversarial loss (CycleGAN-DAL) is proposed for bone-conducted speech enhancement. The proposed method uses an adversarial loss and a cycle-consistent loss s… ▽ More Compared with air-conducted speech, bone-conducted speech has the unique advantage of shielding background noise. Enhancement of bone-conducted speech helps to improve its quality and intelligibility. In this paper, a novel CycleGAN with dual adversarial loss (CycleGAN-DAL) is proposed for bone-conducted speech enhancement. The proposed method uses an adversarial loss and a cycle-consistent loss simultaneously to learn forward and cyclic mapping, in which the adversarial loss is replaced with the classification adversarial loss and the defect adversarial loss to consolidate the forward mapping. Compared with conventional baseline methods, it can learn feature mapping between bone-conducted speech and target speech without additional air-conducted speech assistance. Moreover, the proposed method also avoids the oversmooth problem which is occurred commonly in conventional statistical based models. Experimental results show that the proposed method outperforms baseline methods such as CycleGAN, GMM, and BLSTM. Keywords: Bone-conducted speech enhancement, dual adversarial loss, Parallel CycleGAN, high frequency speech reconstruction △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2111.01342 [pdf, other]

Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Authors: Teng Gao, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan

Abstract: Whispered speech is a special way of pronunciation without using vocal cord vibration. A whispered speech does not contain a fundamental frequency, and its energy is about 20dB lower than that of a normal speech. Converting a whispered speech into a normal speech can improve speech quality and intelligibility. In this paper, a novel attention-guided generative adversarial network model incorporati… ▽ More Whispered speech is a special way of pronunciation without using vocal cord vibration. A whispered speech does not contain a fundamental frequency, and its energy is about 20dB lower than that of a normal speech. Converting a whispered speech into a normal speech can improve speech quality and intelligibility. In this paper, a novel attention-guided generative adversarial network model incorporating an autoencoder, a Siamese neural network, and an identity mapping loss function for whisper to normal speech conversion (AGAN-W2SC) is proposed. The proposed method avoids the challenge of estimating the fundamental frequency of the normal voiced speech converted from a whispered speech. Specifically, the proposed model is more amendable to practical applications because it does not need to align speech features for training. Experimental results demonstrate that the proposed AGAN-W2SC can obtain improved speech quality and intelligibility compared with dynamic-time-warping-based methods. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2107.06261 [pdf, other]

Tight running times for minimum $\ell_q$-norm load balancing: beyond exponential dependencies on $1/ε$

Authors: Lin Chen, Liangde Tao, José Verschae

Abstract: We consider a classical scheduling problem on $m$ identical machines. For an arbitrary constant $q>1$, the aim is to assign jobs to machines such that $\sum_{i=1}^m C_i^q$ is minimized, where $C_i$ is the total processing time of jobs assigned to machine $i$. It is well known that this problem is strongly NP-hard. Under mild assumptions, the running time of an $(1+ε)$-approximation algorithm for… ▽ More We consider a classical scheduling problem on $m$ identical machines. For an arbitrary constant $q>1$, the aim is to assign jobs to machines such that $\sum_{i=1}^m C_i^q$ is minimized, where $C_i$ is the total processing time of jobs assigned to machine $i$. It is well known that this problem is strongly NP-hard. Under mild assumptions, the running time of an $(1+ε)$-approximation algorithm for a strongly NP-hard problem cannot be polynomial on $1/ε$, unless $\text{P}=\text{NP}$. For most problems in the literature, this translates into algorithms with running time at least as large as $2^{Ω(1/\varepsilon)}+n^{O(1)}$. For the natural scheduling problem above, we establish the existence of an algorithm which violates this threshold. More precisely, we design a PTAS that runs in $2^{\tilde{O}(\sqrt{1/ε})}+n^{O(1)}$ time. This result is in sharp contrast to the closely related minimum makespan variant, where an exponential lower bound is known under the exponential time hypothesis (ETH). We complement our result with an essentially matching lower bound on the running time, showing that our algorithm is best-possible under ETH. The lower bound proof exploits new number-theoretical constructions for variants of progression-free sets, which might be of independent interest. Furthermore, we provide a fine-grained characterization on the running time of a PTAS for this problem depending on the relation between $ε$ and the number of machines $m$. More precisely, our lower bound only holds when $m=Θ(\sqrt{1/ε})$. Better algorithms, that go beyond the lower bound, exist for other values of $m$. In particular, there even exists an algorithm with running time polynomial in $1/ε$ if we restrict ourselves to instances with $m=Ω(1/ε\log^21/ε)$. △ Less

Submitted 13 July, 2021; originally announced July 2021.

arXiv:2107.02713 [pdf, other]

doi 10.1109/TIP.2022.3181511

Predicate correlation learning for scene graph generation

Authors: Leitian Tao, Li Mi, Nannan Li, Xianhang Cheng, Yaosi Hu, Zhenzhong Chen

Abstract: For a typical Scene Graph Generation (SGG) method, there is often a large gap in the performance of the predicates' head classes and tail classes. This phenomenon is mainly caused by the semantic overlap between different predicates as well as the long-tailed data distribution. In this paper, a Predicate Correlation Learning (PCL) method for SGG is proposed to address the above two problems by tak… ▽ More For a typical Scene Graph Generation (SGG) method, there is often a large gap in the performance of the predicates' head classes and tail classes. This phenomenon is mainly caused by the semantic overlap between different predicates as well as the long-tailed data distribution. In this paper, a Predicate Correlation Learning (PCL) method for SGG is proposed to address the above two problems by taking the correlation between predicates into consideration. To describe the semantic overlap between strong-correlated predicate classes, a Predicate Correlation Matrix (PCM) is defined to quantify the relationship between predicate pairs, which is dynamically updated to remove the matrix's long-tailed bias. In addition, PCM is integrated into a Predicate Correlation Loss function ($L_{PC}$) to reduce discouraging gradients of unannotated classes. The proposed method is evaluated on Visual Genome benchmark, where the performance of the tail classes is significantly improved when built on the existing methods. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2106.10891 [pdf, other]

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise

Authors: Hongxin Wei, Lue Tao, Renchunzi Xie, Bo An

Abstract: Learning with noisy labels is a practically challenging problem in weakly supervised learning. In the existing literature, open-set noises are always considered to be poisonous for generalization, similar to closed-set noises. In this paper, we empirically show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels. Inspired by the observations, w… ▽ More Learning with noisy labels is a practically challenging problem in weakly supervised learning. In the existing literature, open-set noises are always considered to be poisonous for generalization, similar to closed-set noises. In this paper, we empirically show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels. Inspired by the observations, we propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training. With ODNL, the extra capacity of the neural network can be largely consumed in a way that does not interfere with learning patterns from clean data. Through the lens of SGD noise, we show that the noises induced by our method are random-direction, conflict-free and biased, which may help the model converge to a flat minimum with superior stability and enforce the model to produce conservative predictions on Out-of-Distribution instances. Extensive experimental results on benchmark datasets with various types of noisy labels demonstrate that the proposed method not only enhances the performance of many existing robust algorithms but also achieves significant improvement on Out-of-Distribution detection tasks even in the label noise setting. △ Less

Submitted 18 November, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: Accepted by NeurIPS 2021

arXiv:2106.07030 [pdf, other]

The Backpropagation Algorithm Implemented on Spiking Neuromorphic Hardware

Authors: Alpha Renner, Forrest Sheldon, Anatoly Zlotnik, Louis Tao, Andrew Sornborger

Abstract: The capabilities of natural neural systems have inspired new generations of machine learning algorithms as well as neuromorphic very large-scale integrated (VLSI) circuits capable of fast, low-power information processing. However, it has been argued that most modern machine learning algorithms are not neurophysiologically plausible. In particular, the workhorse of modern deep learning, the backpr… ▽ More The capabilities of natural neural systems have inspired new generations of machine learning algorithms as well as neuromorphic very large-scale integrated (VLSI) circuits capable of fast, low-power information processing. However, it has been argued that most modern machine learning algorithms are not neurophysiologically plausible. In particular, the workhorse of modern deep learning, the backpropagation algorithm, has proven difficult to translate to neuromorphic hardware. In this study, we present a neuromorphic, spiking backpropagation algorithm based on synfire-gated dynamical information coordination and processing, implemented on Intel's Loihi neuromorphic research processor. We demonstrate a proof-of-principle three-layer circuit that learns to classify digits from the MNIST dataset. To our knowledge, this is the first work to show a Spiking Neural Network (SNN) implementation of the backpropagation algorithm that is fully on-chip, without a computer in the loop. It is competitive in accuracy with off-chip trained SNNs and achieves an energy-delay product suitable for edge computing. This implementation shows a path for using in-memory, massively parallel neuromorphic processors for low-power, low-latency implementation of modern deep learning applications. △ Less

Submitted 26 August, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

Comments: 21 pages, 5 figures, Changes v1->v2: minor changes of text and formatting, correction of total power in supplementary Table III

Report number: LA-UR-21-24457 ACM Class: I.2.6

arXiv:2105.12106 [pdf, other]

Adversarial Attack Driven Data Augmentation for Accurate And Robust Medical Image Segmentation

Authors: Mst. Tasnim Pervin, Linmi Tao, Aminul Huq, Zuoxiang He, Li Huo

Abstract: Segmentation is considered to be a very crucial task in medical image analysis. This task has been easier since deep learning models have taken over with its high performing behavior. However, deep learning models dependency on large data proves it to be an obstacle in medical image analysis because of insufficient data samples. Several data augmentation techniques have been used to mitigate this… ▽ More Segmentation is considered to be a very crucial task in medical image analysis. This task has been easier since deep learning models have taken over with its high performing behavior. However, deep learning models dependency on large data proves it to be an obstacle in medical image analysis because of insufficient data samples. Several data augmentation techniques have been used to mitigate this problem. We propose a new augmentation method by introducing adversarial learning attack techniques, specifically Fast Gradient Sign Method (FGSM). Furthermore, We have also introduced the concept of Inverse FGSM (InvFGSM), which works in the opposite manner of FGSM for the data augmentation. This two approaches worked together to improve the segmentation accuracy, as well as helped the model to gain robustness against adversarial attacks. The overall analysis of experiments indicates a novel use of adversarial machine learning along with robustness enhancement. △ Less

Submitted 25 May, 2021; originally announced May 2021.

arXiv:2103.14824 [pdf, other]

Improving Model Robustness by Adaptively Correcting Perturbation Levels with Active Queries

Authors: Kun-Peng Ning, Lue Tao, Songcan Chen, Sheng-Jun Huang

Abstract: In addition to high accuracy, robustness is becoming increasingly important for machine learning models in various applications. Recently, much research has been devoted to improving the model robustness by training with noise perturbations. Most existing studies assume a fixed perturbation level for all training examples, which however hardly holds in real tasks. In fact, excessive perturbations… ▽ More In addition to high accuracy, robustness is becoming increasingly important for machine learning models in various applications. Recently, much research has been devoted to improving the model robustness by training with noise perturbations. Most existing studies assume a fixed perturbation level for all training examples, which however hardly holds in real tasks. In fact, excessive perturbations may destroy the discriminative content of an example, while deficient perturbations may fail to provide helpful information for improving the robustness. Motivated by this observation, we propose to adaptively adjust the perturbation levels for each example in the training process. Specifically, a novel active learning framework is proposed to allow the model to interactively query the correct perturbation level from human experts. By designing a cost-effective sampling strategy along with a new query type, the robustness can be significantly improved with a few queries. Both theoretical analysis and experimental studies validate the effectiveness of the proposed approach. △ Less

Submitted 27 March, 2021; originally announced March 2021.

Comments: To be published in AAAI-21

Showing 1–50 of 77 results for author: Tao, L