Search | arXiv e-print repository

doi 10.1145/3548606.3560561

Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models

Authors: Shawn Shan, Wenxin Ding, Emily Wenger, Haitao Zheng, Ben Y. Zhao

Abstract: Server breaches are an unfortunate reality on today's Internet. In the context of deep neural network (DNN) models, they are particularly harmful, because a leaked model gives an attacker "white-box" access to generate adversarial examples, a threat model that has no practical robust defenses. For practitioners who have invested years and millions into proprietary DNNs, e.g. medical imaging, this… ▽ More Server breaches are an unfortunate reality on today's Internet. In the context of deep neural network (DNN) models, they are particularly harmful, because a leaked model gives an attacker "white-box" access to generate adversarial examples, a threat model that has no practical robust defenses. For practitioners who have invested years and millions into proprietary DNNs, e.g. medical imaging, this seems like an inevitable disaster looming on the horizon. In this paper, we consider the problem of post-breach recovery for DNN models. We propose Neo, a new system that creates new versions of leaked models, alongside an inference time filter that detects and removes adversarial examples generated on previously leaked models. The classification surfaces of different model versions are slightly offset (by introducing hidden distributions), and Neo detects the overfitting of attacks to the leaked model used in its generation. We show that across a variety of tasks and attack methods, Neo is able to filter out attacks from leaked models with very high accuracy, and provides strong protection (7--10 recoveries) against attackers who repeatedly breach the server. Neo performs well against a variety of strong adaptive attacks, dropping slightly in # of breaches recoverable, and demonstrates potential as a complement to DNN defenses in the wild. △ Less

Submitted 16 October, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

Journal ref: 2022 ACM Conference on Computer and Communications Security (CCS)

arXiv:2205.03817 [pdf, other]

PGADA: Perturbation-Guided Adversarial Alignment for Few-shot Learning Under the Support-Query Shift

Authors: Siyang Jiang, Wei Ding, Hsi-Wen Chen, Ming-Syan Chen

Abstract: Few-shot learning methods aim to embed the data to a low-dimensional embedding space and then classify the unseen query data to the seen support set. While these works assume that the support set and the query set lie in the same embedding space, a distribution shift usually occurs between the support set and the query set, i.e., the Support-Query Shift, in the real world. Though optimal transport… ▽ More Few-shot learning methods aim to embed the data to a low-dimensional embedding space and then classify the unseen query data to the seen support set. While these works assume that the support set and the query set lie in the same embedding space, a distribution shift usually occurs between the support set and the query set, i.e., the Support-Query Shift, in the real world. Though optimal transportation has shown convincing results in aligning different distributions, we find that the small perturbations in the images would significantly misguide the optimal transportation and thus degrade the model performance. To relieve the misalignment, we first propose a novel adversarial data augmentation method, namely Perturbation-Guided Adversarial Alignment (PGADA), which generates the hard examples in a self-supervised manner. In addition, we introduce Regularized Optimal Transportation to derive a smooth optimal transportation plan. Extensive experiments on three benchmark datasets manifest that our framework significantly outperforms the eleven state-of-the-art methods on three datasets. △ Less

Submitted 8 May, 2022; originally announced May 2022.

arXiv:2205.00292 [pdf, other]

doi 10.1103/PhysRevA.106.012604

Dynamic quantum-enhanced sensing without entanglement in central spin systems

Authors: Wenkui Ding, Yanxia Liu, Zhenyu Zheng, Shu Chen

Abstract: We propose a dynamic quantum sensing scheme by using a quantum many-spin system composed of a central spin interacting with many surrounding spins. Starting from a generalized Ising ring model, we investigate the error propagation formula of the central spin and it indicates that Heisenberg scaling can be reached while the probe state only needs to be a product state. Particularly, we derive an an… ▽ More We propose a dynamic quantum sensing scheme by using a quantum many-spin system composed of a central spin interacting with many surrounding spins. Starting from a generalized Ising ring model, we investigate the error propagation formula of the central spin and it indicates that Heisenberg scaling can be reached while the probe state only needs to be a product state. Particularly, we derive an analytical form of the dynamic quantum Fisher information in a limit case, which explicitly exhibits the Heisenberg scaling. By comparing with numerical results, we demonstrate that the general case can be well approximated by the analytical result when the coupling strength among the surrounding spins is much weaker than the coupling strength between the central and surrounding spins. This analytic result guides us to find the appropriate probe state and the proper measurement time, to achieve the Heisenberg scaling in realistic situations. Furthermore, we investigate various effects which are important in practical quantum systems, including the central spin Zeeman term, the anisotropy of the hyperfine interaction and the inhomogeneity of the hyperfine coupling strength. Our result indicates that the dynamic quantum-enhanced sensing scheme seems feasible in realistic quantum central spin systems, like semiconductor quantum dots. △ Less

Submitted 30 April, 2022; originally announced May 2022.

arXiv:2204.10972 [pdf, other]

GRM: Gradient Rectification Module for Visual Place Retrieval

Authors: Boshu Lei, Wenjie Ding, Limeng Qiao, Xi Qiu

Abstract: Visual place retrieval aims to search images in the database that depict similar places as the query image. However, global descriptors encoded by the network usually fall into a low dimensional principal space, which is harmful to the retrieval performance. We first analyze the cause of this phenomenon, pointing out that it is due to degraded distribution of the gradients of descriptors. Then, we… ▽ More Visual place retrieval aims to search images in the database that depict similar places as the query image. However, global descriptors encoded by the network usually fall into a low dimensional principal space, which is harmful to the retrieval performance. We first analyze the cause of this phenomenon, pointing out that it is due to degraded distribution of the gradients of descriptors. Then, we propose Gradient Rectification Module(GRM) to alleviate this issue. GRM is appended after the final pooling layer and can rectify gradients to the complementary space of the principal space. With GRM, the network is encouraged to generate descriptors more uniformly in the whole space. At last, we conduct experiments on multiple datasets and generalize our method to classification task under prototype learning framework. △ Less

Submitted 27 February, 2023; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: Accepted to the 2023 International Conference on Robotics and Automation (ICRA 2023)

arXiv:2204.10520 [pdf, ps, other]

High rectifying performance of heterojunctions with interface between armchair C$_3$N nanoribbons with and without edge H-passivation

Authors: Jie Zhang, Wence Ding, Xiaobo Li, Guanghui Zhou

Abstract: Two-dimensional polyaniline with C$_3$N stoichiometry, is a newly fabricated layered material that has been expected to possess fascinating electronic, thermal, mechanical and chemical properties. The nature of its counterpart nano-ribbons/structures offering even more tunability in property because of the unique quantum confinement and edge effect, however, has not been revealed sufficiently. Her… ▽ More Two-dimensional polyaniline with C$_3$N stoichiometry, is a newly fabricated layered material that has been expected to possess fascinating electronic, thermal, mechanical and chemical properties. The nature of its counterpart nano-ribbons/structures offering even more tunability in property because of the unique quantum confinement and edge effect, however, has not been revealed sufficiently. Here, using the first-principles calculation based on density functional theory and nonequilibrium Green's function technique, we first perform a study on the electron band structure of armchair C$_3$N nanoribbons (AC$_3$NNRs) without and with H-passivation. The calculated results show that the pristine AC$_3$NNRs are metal, while the H-passivated ones are either direct or indirect band gap semiconductors depending on the detailed edge atomic configurations. Then we propose a lateral planar homogenous junction with an interface between the pristine and H-passivated AC$_3$NNRs, in which forms a Schottky-like barrier. Interestingly, our further transport calculation demonstrates that this AC$_3$NNRs-based heterojunction exhibits a good rectification behavior. In specification, the average rectification ratio (RR) can reach up to $10^3$ in the bias regime from 0.2 to 0.4 V. Particularly, extending the length of semiconductor part in the heterojunction leads to the decrease of the current through the junction, but the RR can be enlarged obviously. The average RR increases to the order of $10^4$ in the bias from 0.25 to 0.40 V, with the boosted maximum up to $10^5$ at 0.35 V. The findings of this work may be serviceable for the design of functional nanodevices based on AC$_3$NNRs in the future. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: 9 pages, 8 figures

arXiv:2204.09301 [pdf, ps, other]

Bistable pulsating fronts in slowly oscillating environments

Authors: Weiwei Ding, François Hamel, Xing Liang

Abstract: We consider reaction-diffusion fronts in spatially periodic bistable media with large periods. Whereas the homogenization regime associated with small periods had been well studied for bistable or Fisher-KPP reactions and, in the latter case, a formula for the limit minimal speeds of fronts in media with large periods had also been obtained thanks to the linear formulation of these minimal speeds… ▽ More We consider reaction-diffusion fronts in spatially periodic bistable media with large periods. Whereas the homogenization regime associated with small periods had been well studied for bistable or Fisher-KPP reactions and, in the latter case, a formula for the limit minimal speeds of fronts in media with large periods had also been obtained thanks to the linear formulation of these minimal speeds and their monotonicity with respect to the period, the main remaining open question is concerned with fronts in bistable environments with large periods. In bistable media the unique front speeds are not linearly determined and are not monotone with respect to the spatial period in general, making the analysis of the limit of large periods more intricate. We show in this paper the existence of and an explicit formula for the limit of bistable front speeds as the spatial period goes to infinity. We also prove that the front profiles converge to a family of front profiles associated with spatially homogeneous equations. The main results are based on uniform estimates on the spatial width of the fronts, which themselves use zero number properties and intersection arguments. △ Less

Submitted 1 July, 2024; v1 submitted 20 April, 2022; originally announced April 2022.

arXiv:2204.06635 [pdf, other]

doi 10.1109/TFUZZ.2019.2949771

A Novel Approach for Optimum-Path Forest Classification Using Fuzzy Logic

Authors: Renato W. R. de Souza, João V. C. de Oliveira, Leandro A. Passos, Weiping Ding, João P. Papa, Victor Hugo C. de Albuquerque

Abstract: In the past decades, fuzzy logic has played an essential role in many research areas. Alongside, graph-based pattern recognition has shown to be of great importance due to its flexibility in partitioning the feature space using the background from graph theory. Some years ago, a new framework for both supervised, semi-supervised, and unsupervised learning named Optimum-Path Forest (OPF) was propos… ▽ More In the past decades, fuzzy logic has played an essential role in many research areas. Alongside, graph-based pattern recognition has shown to be of great importance due to its flexibility in partitioning the feature space using the background from graph theory. Some years ago, a new framework for both supervised, semi-supervised, and unsupervised learning named Optimum-Path Forest (OPF) was proposed with competitive results in several applications, besides comprising a low computational burden. In this paper, we propose the Fuzzy Optimum-Path Forest, an improved version of the standard OPF classifier that learns the samples' membership in an unsupervised fashion, which are further incorporated during supervised training. Such information is used to identify the most relevant training samples, thus improving the classification step. Experiments conducted over twelve public datasets highlight the robustness of the proposed approach, which behaves similarly to standard OPF in worst-case scenarios. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Journal ref: IEEE Transactions on Fuzzy Systems 28.12 (2019): 3076-3086

arXiv:2204.04645 [pdf, other]

Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data

Authors: Yu Kang, Tianqiao Liu, Hang Li, Yang Hao, Wenbiao Ding

Abstract: Multimodal pre-training for audio-and-text has recently been proved to be effective and has significantly improved the performance of many downstream speech understanding tasks. However, these state-of-the-art pre-training audio-text models work well only when provided with large amount of parallel audio-and-text data, which brings challenges on many languages that are rich in unimodal corpora but… ▽ More Multimodal pre-training for audio-and-text has recently been proved to be effective and has significantly improved the performance of many downstream speech understanding tasks. However, these state-of-the-art pre-training audio-text models work well only when provided with large amount of parallel audio-and-text data, which brings challenges on many languages that are rich in unimodal corpora but scarce of parallel cross-modal corpus. In this paper, we investigate whether it is possible to pre-train an audio-text multimodal model with extremely low-resource parallel data and extra non-parallel unimodal data. Our pre-training framework consists of the following components: (1) Intra-modal Denoising Auto-Encoding (IDAE), which is able to reconstruct input text (audio) representations from a noisy version of itself. (2) Cross-modal Denoising Auto-Encoding (CDAE), which is pre-trained to reconstruct the input text (audio), given both a noisy version of the input text (audio) and the corresponding translated noisy audio features (text embeddings). (3) Iterative Denoising Process (IDP), which iteratively translates raw audio (text) and the corresponding text embeddings (audio features) translated from previous iteration into the new less-noisy text embeddings (audio features). We adapt a dual cross-modal Transformer as our backbone model which consists of two unimodal encoders for IDAE and two cross-modal encoders for CDAE and IDP. Our method achieves comparable performance on multiple downstream speech understanding tasks compared with the model pre-trained on fully parallel data, demonstrating the great potential of the proposed method. Our code is available at: \url{https://github.com/KarlYuKang/Low-Resource-Multimodal-Pre-training}. △ Less

Submitted 10 April, 2022; originally announced April 2022.

Comments: AAAI 2022

arXiv:2204.04338 [pdf, other]

doi 10.1016/j.asoc.2021.108359

Fuzzy temporal convolutional neural networks in P300-based Brain-computer interface for smart home interaction

Authors: Christian Flores Vega, Jonathan Quevedo, Elmer Escandón, Mehrin Kiani, Weiping Ding, Javier Andreu-Perez

Abstract: The processing and classification of electroencephalographic signals (EEG) are increasingly performed using deep learning frameworks, such as convolutional neural networks (CNNs), to generate abstract features from brain data, automatically paving the way for remarkable classification prowess. However, EEG patterns exhibit high variability across time and uncertainty due to noise. It is a signific… ▽ More The processing and classification of electroencephalographic signals (EEG) are increasingly performed using deep learning frameworks, such as convolutional neural networks (CNNs), to generate abstract features from brain data, automatically paving the way for remarkable classification prowess. However, EEG patterns exhibit high variability across time and uncertainty due to noise. It is a significant problem to be addressed in P300-based Brain Computer Interface (BCI) for smart home interaction. It operates in a non-optimal natural environment where added noise is often present. In this work, we propose a sequential unification of temporal convolutional networks (TCNs) modified to EEG signals, LSTM cells, with a fuzzy neural block (FNB), which we called EEG-TCFNet. Fuzzy components may enable a higher tolerance to noisy conditions. We applied three different architectures comparing the effect of using block FNB to classify a P300 wave to build a BCI for smart home interaction with healthy and post-stroke individuals. Our results reported a maximum classification accuracy of 98.6% and 74.3% using the proposed method of EEG-TCFNet in subject-dependent strategy and subject-independent strategy, respectively. Overall, FNB usage in all three CNN topologies outperformed those without FNB. In addition, we compared the addition of FNB to other state-of-the-art methods and obtained higher classification accuracies on account of the integration with FNB. The remarkable performance of the proposed model, EEG-TCFNet, and the general integration of fuzzy units to other classifiers would pave the way for enhanced P300-based BCIs for smart home interaction within natural settings. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Journal ref: Applied Soft Computing 117 (2022) 108359

arXiv:2204.02321 [pdf, other]

SAFARI: Sparsity enabled Federated Learning with Limited and Unreliable Communications

Authors: Yuzhu Mao, Zihao Zhao, Meilin Yang, Le Liang, Yang Liu, Wenbo Ding, Tian Lan, Xiao-Ping Zhang

Abstract: Federated learning (FL) enables edge devices to collaboratively learn a model in a distributed fashion. Many existing researches have focused on improving communication efficiency of high-dimensional models and addressing bias caused by local updates. However, most of FL algorithms are either based on reliable communications or assume fixed and known unreliability characteristics. In practice, net… ▽ More Federated learning (FL) enables edge devices to collaboratively learn a model in a distributed fashion. Many existing researches have focused on improving communication efficiency of high-dimensional models and addressing bias caused by local updates. However, most of FL algorithms are either based on reliable communications or assume fixed and known unreliability characteristics. In practice, networks could suffer from dynamic channel conditions and non-deterministic disruptions, with time-varying and unknown characteristics. To this end, in this paper we propose a sparsity enabled FL framework with both communication efficiency and bias reduction, termed as SAFARI. It makes novel use of a similarity among client models to rectify and compensate for bias that is resulted from unreliable communications. More precisely, sparse learning is implemented on local clients to mitigate communication overhead, while to cope with unreliable communications, a similarity-based compensation method is proposed to provide surrogates for missing model updates. We analyze SAFARI under bounded dissimilarity and with respect to sparse models. It is demonstrated that SAFARI under unreliable communications is guaranteed to converge at the same rate as the standard FedAvg with perfect communications. Implementations and evaluations on CIFAR-10 dataset validate the effectiveness of SAFARI by showing that it can achieve the same convergence speed and accuracy as FedAvg with perfect communications, with up to 80% of the model weights being pruned and a high percentage of client updates missing in each round. △ Less

Submitted 5 April, 2022; originally announced April 2022.

arXiv:2203.16003 [pdf]

doi 10.1103/PhysRevApplied.18.054078

Thermal Modulation of Gigahertz Surface Acoustic Waves on Lithium Niobate

Authors: Linbo Shao, Sophie W. Ding, Yunwei Ma, Yuhao Zhang, Neil Sinclair, Marko Loncar

Abstract: Surface acoustic wave (SAW) devices have wide range of applications in microwave signal processing. Microwave SAW components benefit from higher quality factors and much smaller crosstalk when compared to their electromagnetic counterparts. Efficient routing and modulation of SAWs are essential for building large-scale and versatile acoustic-wave circuits. Here, we demonstrate integrated thermo-ac… ▽ More Surface acoustic wave (SAW) devices have wide range of applications in microwave signal processing. Microwave SAW components benefit from higher quality factors and much smaller crosstalk when compared to their electromagnetic counterparts. Efficient routing and modulation of SAWs are essential for building large-scale and versatile acoustic-wave circuits. Here, we demonstrate integrated thermo-acoustic modulators using two SAW platforms: bulk lithium niobate and thin-film lithium niobate on sapphire. In both approaches, the gigahertz-frequency SAWs are routed by integrated acoustic waveguides while on-chip microheaters are used to locally change the temperature and thus control the phase of SAW. Using this approach, we achieved phase changes of over 720 degrees with the responsibility of 2.6 deg/mW for bulk lithium niobate and 0.52 deg/mW for lithium niobate on sapphire. Furthermore, we demonstrated amplitude modulation of SAWs using acoustic Mach Zehnder interferometers. Our thermo-acoustic modulators can enable reconfigurable acoustic signal processing for next generation wireless communications and microwave systems. △ Less

Submitted 27 October, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Journal ref: Phys. Rev. Applied 18, 054078 (2022)

arXiv:2203.13149 [pdf]

doi 10.1016/j.ijmecsci.2023.108903

Observation of Dual-band Topological Corner Modes in Acoustic Kagome Lattice with Long-range Interactions

Authors: Chen Chen, Tianning Chen, Wei Ding, Jian Zhu

Abstract: The recent exotic topological corner modes (CMs) in photonic higher-order topological insulators with long-distance interactions have attracted numerous attentions and enriched the physics than their condensed-matter counterparts. While the next-nearest-neighbour (NNN) coupling appears between NNN lattice sites unselectively, and the NNN coupling and their associated dynamics remains elusive in ac… ▽ More The recent exotic topological corner modes (CMs) in photonic higher-order topological insulators with long-distance interactions have attracted numerous attentions and enriched the physics than their condensed-matter counterparts. While the next-nearest-neighbour (NNN) coupling appears between NNN lattice sites unselectively, and the NNN coupling and their associated dynamics remains elusive in acoustics due to the waveguide-resonator model, an analogy of tight-binding model (TBM), drastically hinders its NNN coupling. Here, in acoustics, we demonstrate selective NNN coupling-induced CMs in split-ring resonators-based kagome crystal and observe dual-band CMs. Three types of CMs are demonstrated in the first bulk gap which can be explained by TBM considering NNN coupling and one type of CMs is observed in the second. All of these findings are verified theoretically and experimentally which reveals rich physics in acoustics, opening a new way towards tunable or multi-band metamaterials design, and offering opportunities for intriguing acoustic manipulation and energy localization. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: 17 pages, 5 figures

Journal ref: International Journal of Mechanical Sciences, 265, 108903, 2023

arXiv:2203.05784 [pdf]

AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications

Authors: Jin Hao, Jiaxiang Liu, Jin Li, Wei Pan, Ruizhe Chen, Huimin Xiong, Kaiwei Sun, Hangzheng Lin, Wanlu Liu, Wanghui Ding, Jianfei Yang, Haoji Hu, Yueling Zhang, Yang Feng, Zeyu Zhao, Huikai Wu, Youyi Zheng, Bing Fang, Zuozhu Liu, Zhihe Zhao

Abstract: A critical step in virtual dental treatment planning is to accurately delineate all tooth-bone structures from CBCT with high fidelity and accurate anatomical information. Previous studies have established several methods for CBCT segmentation using deep learning. However, the inherent resolution discrepancy of CBCT and the loss of occlusal and dentition information largely limited its clinical ap… ▽ More A critical step in virtual dental treatment planning is to accurately delineate all tooth-bone structures from CBCT with high fidelity and accurate anatomical information. Previous studies have established several methods for CBCT segmentation using deep learning. However, the inherent resolution discrepancy of CBCT and the loss of occlusal and dentition information largely limited its clinical applicability. Here, we present a Deep Dental Multimodal Analysis (DDMA) framework consisting of a CBCT segmentation model, an intraoral scan (IOS) segmentation model (the most accurate digital dental model), and a fusion model to generate 3D fused crown-root-bone structures with high fidelity and accurate occlusal and dentition information. Our model was trained with a large-scale dataset with 503 CBCT and 28,559 IOS meshes manually annotated by experienced human experts. For CBCT segmentation, we use a five-fold cross validation test, each with 50 CBCT, and our model achieves an average Dice coefficient and IoU of 93.99% and 88.68%, respectively, significantly outperforming the baselines. For IOS segmentations, our model achieves an mIoU of 93.07% and 95.70% on the maxillary and mandible on a test set of 200 IOS meshes, which are 1.77% and 3.52% higher than the state-of-art method. Our DDMA framework takes about 20 to 25 minutes to generate the fused 3D mesh model following the sequential processing order, compared to over 5 hours by human experts. Notably, our framework has been incorporated into a software by a clear aligner manufacturer, and real-world clinical cases demonstrate that our model can visualize crown-root-bone structures during the entire orthodontic treatment and can predict risks like dehiscence and fenestration. These findings demonstrate the potential of multi-modal deep learning to improve the quality of digital dental models and help dentists make better clinical decisions. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: 30 pages, 6 figures, 3 tables

arXiv:2203.01799 [pdf, other]

Coupling Deep Learning with Full Waveform Inversion

Authors: Wen Ding, Kui Ren, Lu Zhang

Abstract: Full waveform inversion (FWI) aims at reconstructing unknown physical coefficients in wave equations using the wave field data generated from multiple incoming sources. In this work, we propose an offline-online computational strategy for coupling classical least-squares based computational inversion with modern deep learning based approaches for FWI to achieve advantages that can not be achieved… ▽ More Full waveform inversion (FWI) aims at reconstructing unknown physical coefficients in wave equations using the wave field data generated from multiple incoming sources. In this work, we propose an offline-online computational strategy for coupling classical least-squares based computational inversion with modern deep learning based approaches for FWI to achieve advantages that can not be achieved with only one of the components. In a nutshell, we develop an offline learning strategy to construct a robust approximation to the inverse operator and utilize it to design a new objective function for the online inversion with new datasets. We demonstrate through numerical simulations that our coupling strategy improves the computational efficiency of FWI with reliable offline training on moderate computational resources (in terms of both the size of the training dataset and the computational cost needed). △ Less

Submitted 3 March, 2022; originally announced March 2022.

MSC Class: 35R30; 49N45; 65M32; 74J25; 78A46; 86A22

arXiv:2202.12082 [pdf, ps, other]

Algebraic-Dynamical Theory for Quantum Many-body Hamiltonians: A Formalized Approach To Strongly Interacting Systems

Authors: Wenxin Ding

Abstract: Non-commutative algebras and entanglement are two of the most important hallmarks of many-body quantum systems. Dynamical perturbation methods are the most widely used approaches for quantum many-body systems. While study of entanglement-based numerical methods are booming recently, the traditional dynamical perturbation methods have not benefited from study of quantum entanglement. In this work,… ▽ More Non-commutative algebras and entanglement are two of the most important hallmarks of many-body quantum systems. Dynamical perturbation methods are the most widely used approaches for quantum many-body systems. While study of entanglement-based numerical methods are booming recently, the traditional dynamical perturbation methods have not benefited from study of quantum entanglement. In this work, we formulate an algebraic-dynamical theory (ADT) by combining the power of quantum algebras and dynamical methods in which quantum entanglement naturally emerges as the organizing principle. We start by introducing a complete operator basis set (COBS), with which an arbitrary state, either pure or mixed, can be represented by the expectation values of COBS. Then we establish a complete mapping from a given state to a complete set of dynamical correlation functions of the state through the Heisenberg- and Schwinger-Dyson-equations-of-motion (SDEOM). The completeness of COBS and the mapping ensures ADT to be a mathematically complete framework in principle. Applying ADT to many-body systems on lattices, we find that the quantum entanglement is represented by the cumulant structure of expectation values of the many-body COBS. The cumulant structure of the state forms a hierarchy in correlations. More importantly, such static correlational hierarchy is inherited by the dynamical correlations and their SDEOM. We propose that the dynamical hierarchy is also carried into any perturbative calculation on that state. We demonstrate the validity of such perturbation hierarchy with an explicit example, in which we show that a single-particle-type perturbative calculation fails while a many-body perturbation following the hierarchy succeeds. We also discuss the computation and approximation schemes of ADT and its implications to other strong coupling theories like parton and slave particle methods. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 14 pages, 0 figure

arXiv:2202.06750 [pdf, other]

doi 10.1038/s41524-022-00840-5

On the role of the microstructure in the deformation of porous solids

Authors: Sansit Patnaik, Mehdi Jokar, Wei Ding, Fabio Semperlotti

Abstract: This study explores the role that the microstructure plays in determining the macroscopic static response of porous elastic continua and exposes the occurrence of position-dependent nonlocal effects that are strictly correlated to the configuration of the microstructure. Then, a nonlocal continuum theory based on variable-order fractional calculus is developed in order to accurately capture the co… ▽ More This study explores the role that the microstructure plays in determining the macroscopic static response of porous elastic continua and exposes the occurrence of position-dependent nonlocal effects that are strictly correlated to the configuration of the microstructure. Then, a nonlocal continuum theory based on variable-order fractional calculus is developed in order to accurately capture the complex spatially distributed nonlocal response. The remarkable potential of the fractional approach is illustrated by simulating the nonlinear thermoelastic response of porous beams. The performance, evaluated both in terms of accuracy and computational efficiency, is directly contrasted with high-fidelity finite element models that fully resolve the pores' geometry. Results indicate that the reduced-order representation of the porous microstructure, captured by the synthetic variable-order parameter, offers a robust and accurate representation of the multiscale material architecture that largely outperforms classical approaches based on the concept of average porosity. △ Less

Submitted 28 September, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

Comments: 11 pages and 5 figures

Journal ref: npj Comput Mater 8, 152 (2022)

arXiv:2202.02215 [pdf, other]

A Survey on Safety-Critical Driving Scenario Generation -- A Methodological Perspective

Authors: Wenhao Ding, Chejian Xu, Mansur Arief, Haohong Lin, Bo Li, Ding Zhao

Abstract: Autonomous driving systems have witnessed a significant development during the past years thanks to the advance in machine learning-enabled sensing and decision-making algorithms. One critical challenge for their massive deployment in the real world is their safety evaluation. Most existing driving systems are still trained and evaluated on naturalistic scenarios collected from daily life or heuri… ▽ More Autonomous driving systems have witnessed a significant development during the past years thanks to the advance in machine learning-enabled sensing and decision-making algorithms. One critical challenge for their massive deployment in the real world is their safety evaluation. Most existing driving systems are still trained and evaluated on naturalistic scenarios collected from daily life or heuristically-generated adversarial ones. However, the large population of cars, in general, leads to an extremely low collision rate, indicating that the safety-critical scenarios are rare in the collected real-world data. Thus, methods to artificially generate scenarios become crucial to measure the risk and reduce the cost. In this survey, we focus on the algorithms of safety-critical scenario generation in autonomous driving. We first provide a comprehensive taxonomy of existing algorithms by dividing them into three categories: data-driven generation, adversarial generation, and knowledge-based generation. Then, we discuss useful tools for scenario generation, including simulation platforms and packages. Finally, we extend our discussion to five main challenges of current works -- fidelity, efficiency, diversity, transferability, controllability -- and research opportunities lighted up by these challenges. △ Less

Submitted 20 June, 2023; v1 submitted 4 February, 2022; originally announced February 2022.

Comments: 18 pages, 5 figures. IEEE Transactions on Intelligent Transportation Systems (T-ITS) 2023

arXiv:2202.02000 [pdf, other]

Cross-Modality Multi-Atlas Segmentation via Deep Registration and Label Fusion

Authors: Wangbin Ding, Lei Li, Xiahai Zhuang, Liqin Huang

Abstract: Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target image; and the transformed atlas labels can be combined to generate target segmentation via label fusion schemes. Many conventional MAS methods employed the atlases from the same modality as the target… ▽ More Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target image; and the transformed atlas labels can be combined to generate target segmentation via label fusion schemes. Many conventional MAS methods employed the atlases from the same modality as the target image. However, the number of atlases with the same modality may be limited or even missing in many clinical applications. Besides, conventional MAS methods suffer from the computational burden of registration or label fusion procedures. In this work, we design a novel cross-modality MAS framework, which uses available atlases from a certain modality to segment a target image from another modality. To boost the computational efficiency of the framework, both the image registration and label fusion are achieved by well-designed deep neural networks. For the atlas-to-target image registration, we propose a bi-directional registration network (BiRegNet), which can efficiently align images from different modalities. For the label fusion, we design a similarity estimation network (SimNet), which estimates the fusion weight of each atlas by measuring its similarity to the target image. SimNet can learn multi-scale information for similarity estimation to improve the performance of label fusion. The proposed framework was evaluated by the left ventricle and liver segmentation tasks on the MM-WHS and CHAOS datasets, respectively. Results have shown that the framework is effective for cross-modality MAS in both registration and label fusion. △ Less

Submitted 28 March, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

arXiv:2201.11803 [pdf, other]

On the Convergence of Heterogeneous Federated Learning with Arbitrary Adaptive Online Model Pruning

Authors: Hanhan Zhou, Tian Lan, Guru Venkataramani, Wenbo Ding

Abstract: One of the biggest challenges in Federated Learning (FL) is that client devices often have drastically different computation and communication resources for local updates. To this end, recent research efforts have focused on training heterogeneous local models obtained by pruning a shared global model. Despite empirical success, theoretical guarantees on convergence remain an open question. In thi… ▽ More One of the biggest challenges in Federated Learning (FL) is that client devices often have drastically different computation and communication resources for local updates. To this end, recent research efforts have focused on training heterogeneous local models obtained by pruning a shared global model. Despite empirical success, theoretical guarantees on convergence remain an open question. In this paper, we present a unifying framework for heterogeneous FL algorithms with {\em arbitrary} adaptive online model pruning and provide a general convergence analysis. In particular, we prove that under certain sufficient conditions and on both IID and non-IID data, these algorithms converges to a stationary point of standard FL for general smooth cost functions, with a convergence rate of $O(\frac{1}{\sqrt{Q}})$. Moreover, we illuminate two key factors impacting convergence: pruning-induced noise and minimum coverage index, advocating a joint design of local pruning masks for efficient training. △ Less

Submitted 9 February, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

arXiv:2201.11308 [pdf, other]

Calibration with Privacy in Peer Review

Authors: Wenxin Ding, Gautam Kamath, Weina Wang, Nihar B. Shah

Abstract: Reviewers in peer review are often miscalibrated: they may be strict, lenient, extreme, moderate, etc. A number of algorithms have previously been proposed to calibrate reviews. Such attempts of calibration can however leak sensitive information about which reviewer reviewed which paper. In this paper, we identify this problem of calibration with privacy, and provide a foundational building block… ▽ More Reviewers in peer review are often miscalibrated: they may be strict, lenient, extreme, moderate, etc. A number of algorithms have previously been proposed to calibrate reviews. Such attempts of calibration can however leak sensitive information about which reviewer reviewed which paper. In this paper, we identify this problem of calibration with privacy, and provide a foundational building block to address it. Specifically, we present a theoretical study of this problem under a simplified-yet-challenging model involving two reviewers, two papers, and an MAP-computing adversary. Our main results establish the Pareto frontier of the tradeoff between privacy (preventing the adversary from inferring reviewer identity) and utility (accepting better papers), and design explicit computationally-efficient algorithms that we prove are Pareto optimal. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: 31 pages, 6 figures

arXiv:2201.03349 [pdf, other]

A Unified Granular-ball Learning Model of Pawlak Rough Set and Neighborhood Rough Set

Authors: Shuyin Xia, Cheng Wang, Guoyin Wang, Weiping Ding, Xinbo Gao, Jianhang Yu, Yujia Zhai, Zizhong Chen

Abstract: Pawlak rough set and neighborhood rough set are the two most common rough set theoretical models. Pawlak can use equivalence classes to represent knowledge, but it cannot process continuous data; neighborhood rough sets can process continuous data, but it loses the ability of using equivalence classes to represent knowledge. To this end, this paper presents a granular-ball rough set based on the g… ▽ More Pawlak rough set and neighborhood rough set are the two most common rough set theoretical models. Pawlak can use equivalence classes to represent knowledge, but it cannot process continuous data; neighborhood rough sets can process continuous data, but it loses the ability of using equivalence classes to represent knowledge. To this end, this paper presents a granular-ball rough set based on the granular-ball computing. The granular-ball rough set can simultaneously represent Pawlak rough sets, and the neighborhood rough set, so as to realize the unified representation of the two. This makes the granular-ball rough set not only can deal with continuous data, but also can use equivalence classes for knowledge representation. In addition, we propose an implementation algorithms of granular-ball rough sets. The experimental results on benchmark datasets demonstrate that, due to the combination of the robustness and adaptability of the granular-ball computing, the learning accuracy of the granular-ball rough set has been greatly improved compared with the Pawlak rough set and the traditional neighborhood rough set. The granular-ball rough set also outperforms nine popular or the state-of-the-art feature selection methods. △ Less

Submitted 14 July, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

Comments: 12 pages, 18 figures

arXiv:2201.01219 [pdf, other]

Multiscale Nonlocal Elasticity: A Distributed Order Fractional Formulation

Authors: Wei Ding, Sansit Patnaik, Fabio Semperlotti

Abstract: This study presents a generalized multiscale nonlocal elasticity theory that leverages distributed order fractional calculus to accurately capture coexisting multiscale and nonlocal effects within a macroscopic continuum. The nonlocal multiscale behavior is captured via distributed order fractional constitutive relations derived from a nonlocal thermodynamic formulation. The governing equations of… ▽ More This study presents a generalized multiscale nonlocal elasticity theory that leverages distributed order fractional calculus to accurately capture coexisting multiscale and nonlocal effects within a macroscopic continuum. The nonlocal multiscale behavior is captured via distributed order fractional constitutive relations derived from a nonlocal thermodynamic formulation. The governing equations of the inhomogeneous continuum are obtained via the Hamilton principle. As a generalization of the constant order fractional continuum theory, the distributed order theory can model complex media characterized by inhomogeneous nonlocality and multiscale effects. In order to understand the correspondence between microscopic effects and the properties of the continuum, an equivalent mass-spring lattice model is also developed by direct discretization of the distributed order elastic continuum. Detailed theoretical arguments are provided to show the equivalence between the discrete and the continuum distributed order models in terms of internal nonlocal forces, potential energy distribution, and boundary conditions. These theoretical arguments facilitate the physical interpretation of the role played by the distributed order framework within nonlocal elasticity theories. They also highlight the outstanding potential and opportunities offered by this methodology to account for multiscale nonlocal effects. The capabilities of the methodology are also illustrated via a numerical study that highlights the excellent agreement between the displacement profiles and the total potential energy predicted by the two models under various order distributions. Remarkably, multiscale effects such as displacement distortion, material softening, and energy concentration are well captured at continuum level by the distributed order theory. △ Less

Submitted 24 December, 2021; originally announced January 2022.

Comments: 31 pages, 9 images, 3 Tables

arXiv:2112.07655 [pdf, other]

doi 10.1029/2021JB023830

Deep Neural Networks for Creating Reliable PmP Database with a Case Study in Southern California

Authors: Wen Ding, Tianjue Li, Xu Yang, Kui Ren, Ping Tong

Abstract: Recent progresses in artificial intelligence and machine learning make it possible to automatically identify seismic phases from exponentially growing seismic data. Despite some exciting successes in automatic picking of the first P- and S-wave arrivals, auto-identification of later seismic phases such as the Moho-reflected PmP waves remains a significant challenge in matching the performance of e… ▽ More Recent progresses in artificial intelligence and machine learning make it possible to automatically identify seismic phases from exponentially growing seismic data. Despite some exciting successes in automatic picking of the first P- and S-wave arrivals, auto-identification of later seismic phases such as the Moho-reflected PmP waves remains a significant challenge in matching the performance of experienced analysts. The main difficulty of machine-identifying PmP waves is that the identifiable PmP waves are rare, making the problem of identifying the PmP waves from a massive seismic database inherently unbalanced. In this work, by utilizing a high-quality PmP dataset (10,192 manual picks) in southern California, we develop PmPNet, a deep-neural-network-based algorithm to automatically identify PmP waves efficiently; by doing so, we accelerate the process of identifying the PmP waves. PmPNet applies similar techniques in the machine learning community to address the unbalancement of PmP datasets. The architecture of PmPNet is a residual neural network (ResNet)-autoencoder with additional predictor block, where encoder, decoder, and predictor are equipped with ResNet connection. We conduct systematic research with field data, concluding that PmPNet can efficiently achieve high precision and high recall simultaneously to automatically identify PmP waves from a massive seismic database. Applying the pre-trained PmPNet to the seismic database from January 1990 to December 1999 in southern California, we obtain nearly twice more PmP picks than the original PmP dataset, providing valuable data for other studies such as mapping the topography of the Moho discontinuity and imaging the lower crust structures of southern California. △ Less

Submitted 25 March, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: 26 pages, 16 figures, and 2 tables

arXiv:2112.05758 [pdf, other]

Edge-Enhanced Dual Discriminator Generative Adversarial Network for Fast MRI with Parallel Imaging Using Multi-view Information

Authors: Jiahao Huang, Weiping Ding, Jun Lv, Jingwen Yang, Hao Dong, Javier Del Ser, Jun Xia, Tiaojuan Ren, Stephen Wong, Guang Yang

Abstract: In clinical medicine, magnetic resonance imaging (MRI) is one of the most important tools for diagnosis, triage, prognosis, and treatment planning. However, MRI suffers from an inherent slow data acquisition process because data is collected sequentially in k-space. In recent years, most MRI reconstruction methods proposed in the literature focus on holistic image reconstruction rather than enhanc… ▽ More In clinical medicine, magnetic resonance imaging (MRI) is one of the most important tools for diagnosis, triage, prognosis, and treatment planning. However, MRI suffers from an inherent slow data acquisition process because data is collected sequentially in k-space. In recent years, most MRI reconstruction methods proposed in the literature focus on holistic image reconstruction rather than enhancing the edge information. This work steps aside this general trend by elaborating on the enhancement of edge information. Specifically, we introduce a novel parallel imaging coupled dual discriminator generative adversarial network (PIDD-GAN) for fast multi-channel MRI reconstruction by incorporating multi-view information. The dual discriminator design aims to improve the edge information in MRI reconstruction. One discriminator is used for holistic image reconstruction, whereas the other one is responsible for enhancing edge information. An improved U-Net with local and global residual learning is proposed for the generator. Frequency channel attention blocks (FCA Blocks) are embedded in the generator for incorporating attention mechanisms. Content loss is introduced to train the generator for better reconstruction quality. We performed comprehensive experiments on Calgary-Campinas public brain MR dataset and compared our method with state-of-the-art MRI reconstruction methods. Ablation studies of residual learning were conducted on the MICCAI13 dataset to validate the proposed modules. Results show that our PIDD-GAN provides high-quality reconstructed MR images, with well-preserved edge information. The time of single-image reconstruction is below 5ms, which meets the demand of faster processing. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Comments: 33 pages, 13 figures, Applied Intelligence

arXiv:2112.04984 [pdf, other]

Robust Weakly Supervised Learning for COVID-19 Recognition Using Multi-Center CT Images

Authors: Qinghao Ye, Yuan Gao, Weiping Ding, Zhangming Niu, Chengjia Wang, Yinghui Jiang, Minhao Wang, Evandro Fei Fang, Wade Menpes-Smith, Jun Xia, Guang Yang

Abstract: The world is currently experiencing an ongoing pandemic of an infectious disease named coronavirus disease 2019 (i.e., COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Computed Tomography (CT) plays an important role in assessing the severity of the infection and can also be used to identify those symptomatic and asymptomatic COVID-19 carriers. With a… ▽ More The world is currently experiencing an ongoing pandemic of an infectious disease named coronavirus disease 2019 (i.e., COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Computed Tomography (CT) plays an important role in assessing the severity of the infection and can also be used to identify those symptomatic and asymptomatic COVID-19 carriers. With a surge of the cumulative number of COVID-19 patients, radiologists are increasingly stressed to examine the CT scans manually. Therefore, an automated 3D CT scan recognition tool is highly in demand since the manual analysis is time-consuming for radiologists and their fatigue can cause possible misjudgment. However, due to various technical specifications of CT scanners located in different hospitals, the appearance of CT images can be significantly different leading to the failure of many automated image recognition approaches. The multi-domain shift problem for the multi-center and multi-scanner studies is therefore nontrivial that is also crucial for a dependable recognition and critical for reproducible and objective diagnosis and prognosis. In this paper, we proposed a COVID-19 CT scan recognition model namely coronavirus information fusion and diagnosis network (CIFD-Net) that can efficiently handle the multi-domain shift problem via a new robust weakly supervised learning paradigm. Our model can resolve the problem of different appearance in CT scan images reliably and efficiently while attaining higher accuracy compared to other state-of-the-art methods. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: 32 pages, 8 figures, Applied Soft Computing

arXiv:2112.04294 [pdf, other]

doi 10.1109/TCSVT.2021.3134410

A Hierarchical Spatio-Temporal Graph Convolutional Neural Network for Anomaly Detection in Videos

Authors: Xianlin Zeng, Yalong Jiang, Wenrui Ding, Hongguang Li, Yafeng Hao, Zifeng Qiu

Abstract: Deep learning models have been widely used for anomaly detection in surveillance videos. Typical models are equipped with the capability to reconstruct normal videos and evaluate the reconstruction errors on anomalous videos to indicate the extent of abnormalities. However, existing approaches suffer from two disadvantages. Firstly, they can only encode the movements of each identity independently… ▽ More Deep learning models have been widely used for anomaly detection in surveillance videos. Typical models are equipped with the capability to reconstruct normal videos and evaluate the reconstruction errors on anomalous videos to indicate the extent of abnormalities. However, existing approaches suffer from two disadvantages. Firstly, they can only encode the movements of each identity independently, without considering the interactions among identities which may also indicate anomalies. Secondly, they leverage inflexible models whose structures are fixed under different scenes, this configuration disables the understanding of scenes. In this paper, we propose a Hierarchical Spatio-Temporal Graph Convolutional Neural Network (HSTGCNN) to address these problems, the HSTGCNN is composed of multiple branches that correspond to different levels of graph representations. High-level graph representations encode the trajectories of people and the interactions among multiple identities while low-level graph representations encode the local body postures of each person. Furthermore, we propose to weightedly combine multiple branches that are better at different scenes. An improvement over single-level graph representations is achieved in this way. An understanding of scenes is achieved and serves anomaly detection. High-level graph representations are assigned higher weights to encode moving speed and directions of people in low-resolution videos while low-level graph representations are assigned higher weights to encode human skeletons in high-resolution videos. Experimental results show that the proposed HSTGCNN significantly outperforms current state-of-the-art models on four benchmark datasets (UCSD Pedestrian, ShanghaiTech, CUHK Avenue and IITB-Corridor) by using much less learnable parameters. △ Less

Submitted 10 December, 2021; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Accepted to IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT)

arXiv:2111.14369 [pdf]

Ferroelectric materials and their applications in green catalysis

Authors: Weitong Ding, Xiao Tang, Wei Li, Liangzhi Kou, Lei Liu

Abstract: The demand for renewable and environmentally friendly energy source has attracted extensive research on high performance catalysts. Ferroelectrics which are a class of materials with a switchable polarization are the unique and promising catalyst candidates due to the significant effects of polarization on surfaces physical and chemical properties. The band bending at the ferroelectric/semiconduct… ▽ More The demand for renewable and environmentally friendly energy source has attracted extensive research on high performance catalysts. Ferroelectrics which are a class of materials with a switchable polarization are the unique and promising catalyst candidates due to the significant effects of polarization on surfaces physical and chemical properties. The band bending at the ferroelectric/semiconductor interface induced by the polarization flip promotes the charge separation and transfer, there by enhancing the photocatalytic performance. More importantly, the reactants can be selectively adsorbed on the surface of the ferroelectric materials depending on the polarization direction, which can effectively lift the basic limitations as imposed by Sabatier principle on catalytic activity. This review summarizes the latest developments of ferroelectric materials, and introduces the ferroelectric-related catalytic application. The possible research directions of 2D ferroelectric materials in chemical catalysis is discussed at the end. The review is expected to inspire the extensive research interests from physical, chemical and material science communities. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.09033 [pdf, other]

A Variation-Aware Quantum Circuit Mapping Approach Based on Multi-agent Cooperation

Authors: Pengcheng Zhu, Weiping Ding, Lihua Wei, Zhijin Guan, Shiguang Feng

Abstract: The quantum circuit mapping approach is an indispensable part of the software stack for the noisy intermediatescale quantum (NISQ) device. It has a significant impact on the reliability of computational tasks on NISQ devices. To improve the overall fidelity of physical circuits, we propose a quantum circuit mapping method based on multi-agent cooperation. This approach considers the Spatio-tempora… ▽ More The quantum circuit mapping approach is an indispensable part of the software stack for the noisy intermediatescale quantum (NISQ) device. It has a significant impact on the reliability of computational tasks on NISQ devices. To improve the overall fidelity of physical circuits, we propose a quantum circuit mapping method based on multi-agent cooperation. This approach considers the Spatio-temporal variation of quantum operation quality on the NISQ device when inserting ancillary operation. It consists of two core components: the qubit placement algorithm and the qubit routing method. The qubit placement algorithm exploits the iterated local search framework to find a desirable initial mapping for the reduced symmetric form of the original circuit. The qubit routing method generates the physical circuit through multi-agent communication and collaboration. Each agent inserts the ancillary gates independently according to its environment state. The quality of the physical circuit evolves according to an information-exchanging mechanism between agents, which combines the local search and global search. To experiment on the benchmark circuits (with hundreds of quantum gates) beyond the capacity of current NISQ devices, we build a noisy simulator with gate error 10x lower than that of the latest NISQ device of IBM. The experimental results confirm the performance of our approach in improving circuit fidelity. Compared with the stateof-the-art method, our method can improve the success rate by 25.86% on average and 95.42% at maximum. △ Less

Submitted 30 November, 2021; v1 submitted 17 November, 2021; originally announced November 2021.

Comments: 13 pages, 10 figures

arXiv:2111.08154 [pdf, other]

doi 10.1109/TSMC.2019.2917599

On the utility of power spectral techniques with feature selection techniques for effective mental task classification in noninvasive BCI

Authors: Akshansh Gupta, Ramesh Kumar Agrawal, Jyoti Singh Kirar, Javier Andreu-Perez, Wei-Ping Ding, Chin-Teng Lin, Mukesh Prasad

Abstract: In this paper classification of mental task-root Brain-Computer Interfaces (BCI) is being investigated, as those are a dominant area of investigations in BCI and are of utmost interest as these systems can be augmented life of people having severe disabilities. The BCI model's performance is primarily dependent on the size of the feature vector, which is obtained through multiple channels. In the… ▽ More In this paper classification of mental task-root Brain-Computer Interfaces (BCI) is being investigated, as those are a dominant area of investigations in BCI and are of utmost interest as these systems can be augmented life of people having severe disabilities. The BCI model's performance is primarily dependent on the size of the feature vector, which is obtained through multiple channels. In the case of mental task classification, the availability of training samples to features are minimal. Very often, feature selection is used to increase the ratio for the mental task classification by getting rid of irrelevant and superfluous features. This paper proposes an approach to select relevant and non-redundant spectral features for the mental task classification. This can be done by using four very known multivariate feature selection methods viz, Bhattacharya's Distance, Ratio of Scatter Matrices, Linear Regression and Minimum Redundancy & Maximum Relevance. This work also deals with a comparative analysis of multivariate and univariate feature selection for mental task classification. After applying the above-stated method, the findings demonstrate substantial improvements in the performance of the learning model for mental task classification. Moreover, the efficacy of the proposed approach is endorsed by carrying out a robust ranking algorithm and Friedman's statistical test for finding the best combinations and comparing different combinations of power spectral density and feature selection methods. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Journal ref: IEEE Transactions on Systems, Man, and Cybernetics: Systems 51.5 (2019): 3080-3092

arXiv:2111.02204 [pdf, other]

Certifiable Deep Importance Sampling for Rare-Event Simulation of Black-Box Systems

Authors: Mansur Arief, Yuanlu Bai, Wenhao Ding, Shengyi He, Zhiyuan Huang, Henry Lam, Ding Zhao

Abstract: Rare-event simulation techniques, such as importance sampling (IS), constitute powerful tools to speed up challenging estimation of rare catastrophic events. These techniques often leverage the knowledge and analysis on underlying system structures to endow desirable efficiency guarantees. However, black-box problems, especially those arising from recent safety-critical applications of AI-driven p… ▽ More Rare-event simulation techniques, such as importance sampling (IS), constitute powerful tools to speed up challenging estimation of rare catastrophic events. These techniques often leverage the knowledge and analysis on underlying system structures to endow desirable efficiency guarantees. However, black-box problems, especially those arising from recent safety-critical applications of AI-driven physical systems, can fundamentally undermine their efficiency guarantees and lead to dangerous under-estimation without diagnostically detected. We propose a framework called Deep Probabilistic Accelerated Evaluation (Deep-PrAE) to design statistically guaranteed IS, by converting black-box samplers that are versatile but could lack guarantees, into one with what we call a relaxed efficiency certificate that allows accurate estimation of bounds on the rare-event probability. We present the theory of Deep-PrAE that combines the dominating point concept with rare-event set learning via deep neural network classifiers, and demonstrate its effectiveness in numerical examples including the safety-testing of intelligent driving algorithms. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: The conference version of this paper has appeared in AISTATS 2021 (arXiv:2006.15722)

arXiv:2110.13939 [pdf, other]

CausalAF: Causal Autoregressive Flow for Safety-Critical Driving Scenario Generation

Authors: Wenhao Ding, Haohong Lin, Bo Li, Ding Zhao

Abstract: Generating safety-critical scenarios, which are crucial yet difficult to collect, provides an effective way to evaluate the robustness of autonomous driving systems. However, the diversity of scenarios and efficiency of generation methods are heavily restricted by the rareness and structure of safety-critical scenarios. Therefore, existing generative models that only estimate distributions from ob… ▽ More Generating safety-critical scenarios, which are crucial yet difficult to collect, provides an effective way to evaluate the robustness of autonomous driving systems. However, the diversity of scenarios and efficiency of generation methods are heavily restricted by the rareness and structure of safety-critical scenarios. Therefore, existing generative models that only estimate distributions from observational data are not satisfying to solve this problem. In this paper, we integrate causality as a prior into the scenario generation and propose a flow-based generative framework, Causal Autoregressive Flow (CausalAF). CausalAF encourages the generative model to uncover and follow the causal relationship among generated objects via novel causal masking operations instead of searching the sample only from observational data. By learning the cause-and-effect mechanism of how the generated scenario causes risk situations rather than just learning correlations from data, CausalAF significantly improves learning efficiency. Extensive experiments on three heterogeneous traffic scenarios illustrate that CausalAF requires much fewer optimization resources to effectively generate safety-critical scenarios. We also show that using generated scenarios as additional training samples empirically improves the robustness of autonomous driving algorithms. △ Less

Submitted 19 August, 2023; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Acceptted to CoRL 2022

arXiv:2110.12211 [pdf, other]

doi 10.3389/fnins.2021.726582

ES-ImageNet: A Million Event-Stream Classification Dataset for Spiking Neural Networks

Authors: Yihan Lin, Wei Ding, Shaohua Qiang, Lei Deng, Guoqi Li

Abstract: With event-driven algorithms, especially the spiking neural networks (SNNs), achieving continuous improvement in neuromorphic vision processing, a more challenging event-stream-dataset is urgently needed. However, it is well known that creating an ES-dataset is a time-consuming and costly task with neuromorphic cameras like dynamic vision sensors (DVS). In this work, we propose a fast and effectiv… ▽ More With event-driven algorithms, especially the spiking neural networks (SNNs), achieving continuous improvement in neuromorphic vision processing, a more challenging event-stream-dataset is urgently needed. However, it is well known that creating an ES-dataset is a time-consuming and costly task with neuromorphic cameras like dynamic vision sensors (DVS). In this work, we propose a fast and effective algorithm termed Omnidirectional Discrete Gradient (ODG) to convert the popular computer vision dataset ILSVRC2012 into its event-stream (ES) version, generating about 1,300,000 frame-based images into ES-samples in 1000 categories. In this way, we propose an ES-dataset called ES-ImageNet, which is dozens of times larger than other neuromorphic classification datasets at present and completely generated by the software. The ODG algorithm implements an image motion to generate local value changes with discrete gradient information in different directions, providing a low-cost and high-speed way for converting frame-based images into event streams, along with Edge-Integral to reconstruct the high-quality images from event streams. Furthermore, we analyze the statistics of the ES-ImageNet in multiple ways, and a performance benchmark of the dataset is also provided using both famous deep neural network algorithms and spiking neural network algorithms. We believe that this work shall provide a new large-scale benchmark dataset for SNNs and neuromorphic vision. △ Less

Submitted 23 October, 2021; originally announced October 2021.

arXiv:2110.07197 [pdf, other]

Semi-supervised Multi-task Learning for Semantics and Depth

Authors: Yufeng Wang, Yi-Hsuan Tsai, Wei-Chih Hung, Wenrui Ding, Shuo Liu, Ming-Hsuan Yang

Abstract: Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance. Typical MTL methods are jointly trained with the complete multitude of ground-truths for all tasks simultaneously. However, one single dataset may not contain the annotations for each task of interest. To address this issue, we propose the Semi-supervised Mult… ▽ More Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance. Typical MTL methods are jointly trained with the complete multitude of ground-truths for all tasks simultaneously. However, one single dataset may not contain the annotations for each task of interest. To address this issue, we propose the Semi-supervised Multi-Task Learning (SemiMTL) method to leverage the available supervisory signals from different datasets, particularly for semantic segmentation and depth estimation tasks. To this end, we design an adversarial learning scheme in our semi-supervised training by leveraging unlabeled data to optimize all the task branches simultaneously and accomplish all tasks across datasets with partial annotations. We further present a domain-aware discriminator structure with various alignment formulations to mitigate the domain discrepancy issue among datasets. Finally, we demonstrate the effectiveness of the proposed method to learn across different datasets on challenging street view and remote sensing benchmarks. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted at WACV 2022

arXiv:2109.10066 [pdf, other]

doi 10.1038/s41535-021-00381-y

Electronic structure and signature of Tomonaga-Luttinger liquid state in epitaxial CoSb$_{1-x}$ nanoribbons

Authors: Rui Lou, Minyinan Lei, Wenjun Ding, Wentao Yang, Xiaoyang Chen, Ran Tao, Shuyue Ding, Xiaoping Shen, Yajun Yan, Ping Cui, Haichao Xu, Rui Peng, Tong Zhang, Zhenyu Zhang, Donglai Feng

Abstract: Recently, monolayer CoSb/SrTiO$_3$ has been proposed as a candidate harboring interfacial superconductivity in analogy with monolayer FeSe/SrTiO$_3$. Experimentally, while the CoSb-based compounds manifesting as nanowires and thin films have been realized on SrTiO$_3$ substrates, serving as a rich playground, their electronic structures are still unknown and yet to be resolved. Here, we have fabri… ▽ More Recently, monolayer CoSb/SrTiO$_3$ has been proposed as a candidate harboring interfacial superconductivity in analogy with monolayer FeSe/SrTiO$_3$. Experimentally, while the CoSb-based compounds manifesting as nanowires and thin films have been realized on SrTiO$_3$ substrates, serving as a rich playground, their electronic structures are still unknown and yet to be resolved. Here, we have fabricated CoSb$_{1-x}$ nanoribbons with quasi-one-dimensional stripes on SrTiO$_3$(001) substrates using molecular beam epitaxy, and investigated the electronic structure by in situ angle-resolved photoemission spectroscopy. Straight Fermi surfaces without lateral dispersions are observed. CoSb$_{1-x}$/SrTiO$_3$ is slightly hole doped, where the interfacial charge transfer is opposite to that in monolayer FeSe/SrTiO$_3$. The spectral weight near Fermi level exhibits power-law-like suppression and obeys a universal temperature scaling, serving as the signature of Tomonaga-Luttinger liquid (TLL) state. The obtained TLL parameter of $\sim$0.21 shows the underlying strong correlations. Our results not only suggest CoSb$_{1-x}$ nanoribbon as a representative TLL system, but also provide clues for further investigations on the CoSb-related interface. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: 27 pages, 3 figures

Journal ref: npj Quantum Materials 6, 79 (2021)

arXiv:2109.02171 [pdf, other]

Right Ventricular Segmentation from Short- and Long-Axis MRIs via Information Transition

Authors: Lei Li, Wangbin Ding, Liqun Huang, Xiahai Zhuang

Abstract: Right ventricular (RV) segmentation from magnetic resonance imaging (MRI) is a crucial step for cardiac morphology and function analysis. However, automatic RV segmentation from MRI is still challenging, mainly due to the heterogeneous intensity, the complex variable shapes, and the unclear RV boundary. Moreover, current methods for the RV segmentation tend to suffer from performance degradation a… ▽ More Right ventricular (RV) segmentation from magnetic resonance imaging (MRI) is a crucial step for cardiac morphology and function analysis. However, automatic RV segmentation from MRI is still challenging, mainly due to the heterogeneous intensity, the complex variable shapes, and the unclear RV boundary. Moreover, current methods for the RV segmentation tend to suffer from performance degradation at the basal and apical slices of MRI. In this work, we propose an automatic RV segmentation framework, where the information from long-axis (LA) views is utilized to assist the segmentation of short-axis (SA) views via information transition. Specifically, we employed the transformed segmentation from LA views as a prior information, to extract the ROI from SA views for better segmentation. The information transition aims to remove the surrounding ambiguous regions in the SA views. %, such as the tricuspid valve regions. We tested our model on a public dataset with 360 multi-center, multi-vendor and multi-disease subjects that consist of both LA and SA MRIs. Our experimental results show that including LA views can be effective to improve the accuracy of the SA segmentation. Our model is publicly available at https://github.com/NanYoMy/MMs-2. △ Less

Submitted 5 September, 2021; originally announced September 2021.

Comments: None

arXiv:2109.00288 [pdf, other]

Exploring variational quantum eigensolver ansatzes for the long-range XY model

Authors: Jia-Bin You, Dax Enshan Koh, Jian Feng Kong, Wen-Jun Ding, Ching Eng Png, Lin Wu

Abstract: Finding the ground state energy and wavefunction of a quantum many-body system is a key problem in quantum physics and chemistry. We study this problem for the long-range XY model by using the variational quantum eigensolver (VQE) algorithm. We consider VQE ansatzes with full and linear entanglement structures consisting of different building gates: the CNOT gate, the controlled-rotation (CRX) gat… ▽ More Finding the ground state energy and wavefunction of a quantum many-body system is a key problem in quantum physics and chemistry. We study this problem for the long-range XY model by using the variational quantum eigensolver (VQE) algorithm. We consider VQE ansatzes with full and linear entanglement structures consisting of different building gates: the CNOT gate, the controlled-rotation (CRX) gate, and the two-qubit rotation (TQR) gate. We find that the full-entanglement CRX and TQR ansatzes can sufficiently describe the ground state energy of the long-range XY model. In contrast, only the full-entanglement TQR ansatz can represent the ground state wavefunction with a fidelity close to one. In addition, we find that instead of using full-entanglement ansatzes, restricted-entanglement ansatzes where entangling gates are applied only between qubits that are a fixed distance from each other already suffice to give acceptable solutions. Using the entanglement entropy to characterize the expressive powers of the VQE ansatzes, we show that the full-entanglement TQR ansatz has the highest expressive power among them. △ Less

Submitted 1 August, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

arXiv:2109.00181 [pdf, other]

CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

Authors: Hang Li, Yu Kang, Tianqiao Liu, Wenbiao Ding, Zitao Liu

Abstract: Existing audio-language task-specific predictive approaches focus on building complicated late-fusion mechanisms. However, these models are facing challenges of overfitting with limited labels and low model generalization abilities. In this paper, we present a Cross-modal Transformer for Audio-and-Language, i.e., CTAL, which aims to learn the intra-modality and inter-modality connections between a… ▽ More Existing audio-language task-specific predictive approaches focus on building complicated late-fusion mechanisms. However, these models are facing challenges of overfitting with limited labels and low model generalization abilities. In this paper, we present a Cross-modal Transformer for Audio-and-Language, i.e., CTAL, which aims to learn the intra-modality and inter-modality connections between audio and language through two proxy tasks on a large amount of audio-and-language pairs: masked language modeling and masked cross-modal acoustic modeling. After fine-tuning our pre-trained model on multiple downstream audio-and-language tasks, we observe significant improvements across various tasks, such as, emotion classification, sentiment analysis, and speaker verification. On this basis, we further propose a specially-designed fusion mechanism that can be used in fine-tuning phase, which allows our pre-trained model to achieve better performance. Lastly, we demonstrate detailed ablation studies to prove that both our novel cross-modality fusion component and audio-language pre-training methods significantly contribute to the promising results. △ Less

Submitted 1 September, 2021; originally announced September 2021.

Comments: The 2021 Conference on Empirical Methods in Natural Language Processing

arXiv:2108.13658 [pdf, other]

Automatic Rule Generation for Time Expression Normalization

Authors: Wentao Ding, Jianhao Chen, Jinmao Li, Yuzhong Qu

Abstract: The understanding of time expressions includes two sub-tasks: recognition and normalization. In recent years, significant progress has been made in the recognition of time expressions while research on normalization has lagged behind. Existing SOTA normalization methods highly rely on rules or grammars designed by experts, which limits their performance on emerging corpora, such as social media te… ▽ More The understanding of time expressions includes two sub-tasks: recognition and normalization. In recent years, significant progress has been made in the recognition of time expressions while research on normalization has lagged behind. Existing SOTA normalization methods highly rely on rules or grammars designed by experts, which limits their performance on emerging corpora, such as social media texts. In this paper, we model time expression normalization as a sequence of operations to construct the normalized temporal value, and we present a novel method called ARTime, which can automatically generate normalization rules from training data without expert interventions. Specifically, ARTime automatically captures possible operation sequences from annotated data and generates normalization rules on time expressions with common surface forms. The experimental results show that ARTime can significantly surpass SOTA methods on the Tweets benchmark, and achieves competitive results with existing expert-engineered rule methods on the TempEval-3 benchmark. △ Less

Submitted 10 October, 2022; v1 submitted 31 August, 2021; originally announced August 2021.

Comments: Accepted to Findings of EMNLP 2021

arXiv:2108.10124 [pdf, other]

Projections of Tropical Fermat-Weber Points

Authors: Weiyi Ding, Xiaoxian Tang

Abstract: In the tropical projective torus, it is not guaranteed that the projection of a Fermat-Weber point of a given data set is a Fermat-Weber point of the projection of the data set. In this paper, we focus on the projection on the tropical triangle (the three-point tropical convex hull), and we develop one algorithm (Algorithm 1) and its improved version (Algorithm 4), such that for a given data set i… ▽ More In the tropical projective torus, it is not guaranteed that the projection of a Fermat-Weber point of a given data set is a Fermat-Weber point of the projection of the data set. In this paper, we focus on the projection on the tropical triangle (the three-point tropical convex hull), and we develop one algorithm (Algorithm 1) and its improved version (Algorithm 4), such that for a given data set in the tropical projective torus, these algorithms output a tropical triangle, on which the projection of a Fermat-Weber point of the data set is a Fermat-Weber point of the projection of the data set. We implement these algorithms in R and test how it works with random data sets. The experimental results show that, these algorithms can succeed with a much higher probability than choosing the tropical triangle randomly, the succeed rate of these two algorithms is stable while data sets are changing randomly, and Algorithm 4 can output the results much faster than Algorithm 1 averagely. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: 21 pages, 5 figures, 4 tables

MSC Class: 14T90; 62R07; 68R01

arXiv:2108.09502 [pdf, other]

Perturbation analysis on T-eigenvalues of third-order tensors

Authors: Changxin Mo, Weiyang Ding, Yimin Wei

Abstract: Perturbation analysis has emerged as a significant concern across multiple disciplines, with notable advancements being achieved, particularly in the realm of matrices. This study centers on specific aspects pertaining to tensor T-eigenvalues within the context of the tensor-tensor multiplication. Initially, an analytical perturbation analysis is introduced to explore the sensitivity of T-eigenval… ▽ More Perturbation analysis has emerged as a significant concern across multiple disciplines, with notable advancements being achieved, particularly in the realm of matrices. This study centers on specific aspects pertaining to tensor T-eigenvalues within the context of the tensor-tensor multiplication. Initially, an analytical perturbation analysis is introduced to explore the sensitivity of T-eigenvalues. In the case of third-order tensors featuring square frontal slices, we extend the classical Gershgorin disc theorem and show that all T-eigenvalues are located inside a union of Gershgorin discs. Additionally, we extend the Bauer-Fike theorem to encompass F-diagonalizable tensors and present two modified versions applicable to more general scenarios. The tensor case of the Kahan theorem, which accounts for general perturbations on Hermite tensors, is also investigated. Furthermore, we propose the concept of pseudospectra for third-order tensors based on tensor-tensor multiplication. We develop four definitions that are equivalent under the spectral norm to characterize tensor $\varepsilon$-pseudospectra. Additionally, we present several pseudospectral properties. To provide visualizations, several numerical examples are also provided to illustrate the $\varepsilon$-pseudospectra of specific tensors at different levels. △ Less

Submitted 11 April, 2024; v1 submitted 21 August, 2021; originally announced August 2021.

Comments: 32pages, 5 pages

MSC Class: 15A18; 15A69; 90C22

arXiv:2108.09456 [pdf]

A Modular Design of Continuously Tunable Full Color Plasmonic Pixels with Broken Rotational Symmetry

Authors: Rui Feng, Hao Wang, Yongyin Cao, Ray J. H. Ng, You Sin Tan, Yanxia Zhang, Fangkui Sun, Cheng-Wei Qiu, Joel K. W. Yang, Weiqiang Ding

Abstract: Color tuning is a fascinating and indispensable property in applications such as advanced display, active camouflaging and information encryption. Thus far, a variety of reconfigurable approaches have been implemented to achieve color change. However, it is still a challenge to enable a continuous color tuning over the entire hue range in a simple, stable and rapid manner without changes in config… ▽ More Color tuning is a fascinating and indispensable property in applications such as advanced display, active camouflaging and information encryption. Thus far, a variety of reconfigurable approaches have been implemented to achieve color change. However, it is still a challenge to enable a continuous color tuning over the entire hue range in a simple, stable and rapid manner without changes in configuration and material properties. Here, we demonstrate an all-optical continuously tunable plasmonic pixel scheme via a modular design approach to realize polarization-controlled full color tuning by breaking the intrinsic symmetry of the unit cell layout. The polarization-controlled full color tunable plasmonic pixels consist of three different types of color modules oriented at an angle of 60° with respect to each other, corresponding to three subtractive primary colors. Without changing the structural properties or surrounding environment, the structural colors can be continuously and precisely tuned across all hues by illuminating linearly polarized light with different polarization directions. Meanwhile, the plasmonic pixels can be flexibly customized for various color tuning processes, such as different initial output colors and color tuning sequences, through the appropriate choice of component modules and the elaborate design of module layouts. Furthermore, we extend the color tuning to achromatic colors, white or black, with the utilization of a single module or the introduction of a black module. The proposed polarization-controlled full color tunable plasmonic pixels hold considerable potential to function as next-generation color pixels integrated with liquid-crystal polarizers. △ Less

Submitted 21 August, 2021; originally announced August 2021.

Comments: 39 pages, 17 figures

arXiv:2108.09076 [pdf, other]

PASTO: Strategic Parameter Optimization in Recommendation Systems -- Probabilistic is Better than Deterministic

Authors: Weicong Ding, Hanlin Tang, Jingshuo Feng, Lei Yuan, Sen Yang, Guangxu Yang, Jie Zheng, Jing Wang, Qiang Su, Dong Zheng, Xuezhong Qiu, Yongqi Liu, Yuxuan Chen, Yang Liu, Chao Song, Dongying Kong, Kai Ren, Peng Jiang, Qiao Lian, Ji Liu

Abstract: Real-world recommendation systems often consist of two phases. In the first phase, multiple predictive models produce the probability of different immediate user actions. In the second phase, these predictions are aggregated according to a set of 'strategic parameters' to meet a diverse set of business goals, such as longer user engagement, higher revenue potential, or more community/network inter… ▽ More Real-world recommendation systems often consist of two phases. In the first phase, multiple predictive models produce the probability of different immediate user actions. In the second phase, these predictions are aggregated according to a set of 'strategic parameters' to meet a diverse set of business goals, such as longer user engagement, higher revenue potential, or more community/network interactions. In addition to building accurate predictive models, it is also crucial to optimize this set of 'strategic parameters' so that primary goals are optimized while secondary guardrails are not hurt. In this setting with multiple and constrained goals, this paper discovers that a probabilistic strategic parameter regime can achieve better value compared to the standard regime of finding a single deterministic parameter. The new probabilistic regime is to learn the best distribution over strategic parameter choices and sample one strategic parameter from the distribution when each user visits the platform. To pursue the optimal probabilistic solution, we formulate the problem into a stochastic compositional optimization problem, in which the unbiased stochastic gradient is unavailable. Our approach is applied in a popular social network platform with hundreds of millions of daily users and achieves +0.22% lift of user engagement in a recommendation task and +1.7% lift in revenue in an advertising optimization scenario comparing to using the best deterministic parameter strategy. △ Less

Submitted 20 August, 2021; originally announced August 2021.

arXiv:2108.07993 [pdf, other]

EPSILON: An Efficient Planning System for Automated Vehicles in Highly Interactive Environments

Authors: Wenchao Ding, Lu Zhang, Jing Chen, Shaojie Shen

Abstract: In this paper, we present an Efficient Planning System for automated vehicles In highLy interactive envirONments (EPSILON). EPSILON is an efficient interaction-aware planning system for automated driving, and is extensively validated in both simulation and real-world dense city traffic. It follows a hierarchical structure with an interactive behavior planning layer and an optimization-based motion… ▽ More In this paper, we present an Efficient Planning System for automated vehicles In highLy interactive envirONments (EPSILON). EPSILON is an efficient interaction-aware planning system for automated driving, and is extensively validated in both simulation and real-world dense city traffic. It follows a hierarchical structure with an interactive behavior planning layer and an optimization-based motion planning layer. The behavior planning is formulated from a partially observable Markov decision process (POMDP), but is much more efficient than naively applying a POMDP to the decision-making problem. The key to efficiency is guided branching in both the action space and observation space, which decomposes the original problem into a limited number of closed-loop policy evaluations. Moreover, we introduce a new driver model with a safety mechanism to overcome the risk induced by the potential imperfectness of prior knowledge. For motion planning, we employ a spatio-temporal semantic corridor (SSC) to model the constraints posed by complex driving environments in a unified way. Based on the SSC, a safe and smooth trajectory is optimized, complying with the decision provided by the behavior planner. We validate our planning system in both simulations and real-world dense traffic, and the experimental results show that our EPSILON achieves human-like driving behaviors in highly interactive traffic flow smoothly and safely without being over-conservative compared to the existing planning methods. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: Accepted by the IEEE Transactions on Robotics (T-RO)

arXiv:2108.02476 [pdf, other]

Colorectal Polyp Classification from White-light Colonoscopy Images via Domain Alignment

Authors: Qin Wang, Hui Che, Weizhen Ding, Li Xiang, Guanbin Li, Zhen Li, Shuguang Cui

Abstract: Differentiation of colorectal polyps is an important clinical examination. A computer-aided diagnosis system is required to assist accurate diagnosis from colonoscopy images. Most previous studies at-tempt to develop models for polyp differentiation using Narrow-Band Imaging (NBI) or other enhanced images. However, the wide range of these models' applications for clinical work has been limited by… ▽ More Differentiation of colorectal polyps is an important clinical examination. A computer-aided diagnosis system is required to assist accurate diagnosis from colonoscopy images. Most previous studies at-tempt to develop models for polyp differentiation using Narrow-Band Imaging (NBI) or other enhanced images. However, the wide range of these models' applications for clinical work has been limited by the lagging of imaging techniques. Thus, we propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification (CPC) through directly using white-light (WL) colonoscopy images in the examination. In practice, during training, the auxiliary NBI images are utilized to train a teacher network and guide the student network to acquire richer feature representation from WL images. The feature transfer is realized by domain alignment and contrastive learning. Eventually the final student network has the ability to extract aligned features from only WL images to facilitate the CPC task. Besides, we release the first public-available paired CPC dataset containing WL-NBI pairs for the alignment training. Quantitative and qualitative evaluation indicates that the proposed method outperforms the previous methods in CPC, improving the accuracy by 5.6%with very fast speed. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: Accepted in MICCAI-21

arXiv:2107.07958 [pdf, other]

Temporal-aware Language Representation Learning From Crowdsourced Labels

Authors: Yang Hao, Xiao Zhai, Wenbiao Ding, Zitao Liu

Abstract: Learning effective language representations from crowdsourced labels is crucial for many real-world machine learning tasks. A challenging aspect of this problem is that the quality of crowdsourced labels suffer high intra- and inter-observer variability. Since the high-capacity deep neural networks can easily memorize all disagreements among crowdsourced labels, directly applying existing supervis… ▽ More Learning effective language representations from crowdsourced labels is crucial for many real-world machine learning tasks. A challenging aspect of this problem is that the quality of crowdsourced labels suffer high intra- and inter-observer variability. Since the high-capacity deep neural networks can easily memorize all disagreements among crowdsourced labels, directly applying existing supervised language representation learning algorithms may yield suboptimal solutions. In this paper, we propose \emph{TACMA}, a \underline{t}emporal-\underline{a}ware language representation learning heuristic for \underline{c}rowdsourced labels with \underline{m}ultiple \underline{a}nnotators. The proposed approach (1) explicitly models the intra-observer variability with attention mechanism; (2) computes and aggregates per-sample confidence scores from multiple workers to address the inter-observer disagreements. The proposed heuristic is extremely easy to implement in around 5 lines of code. The proposed heuristic is evaluated on four synthetic and four real-world data sets. The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. To encourage the reproducible results, we make our code publicly available at \url{https://github.com/CrowdsourcingMining/TACMA}. △ Less