Search | arXiv e-print repository

Self-Correcting Self-Consuming Loops for Generative Model Training

Authors: Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong Hsu, Calvin Luo, Yonglong Tian, Chen Sun

Abstract: As synthetic data becomes higher quality and proliferates on the internet, machine learning models are increasingly trained on a mix of human- and machine-generated data. Despite the successful stories of using synthetic data for representation learning, using synthetic data for generative model training creates "self-consuming loops" which may lead to training instability or even collapse, unless… ▽ More As synthetic data becomes higher quality and proliferates on the internet, machine learning models are increasingly trained on a mix of human- and machine-generated data. Despite the successful stories of using synthetic data for representation learning, using synthetic data for generative model training creates "self-consuming loops" which may lead to training instability or even collapse, unless certain conditions are met. Our paper aims to stabilize self-consuming generative model training. Our theoretical results demonstrate that by introducing an idealized correction function, which maps a data point to be more likely under the true data distribution, self-consuming loops can be made exponentially more stable. We then propose self-correction functions, which rely on expert knowledge (e.g. the laws of physics programmed in a simulator), and aim to approximate the idealized corrector automatically and at scale. We empirically validate the effectiveness of self-correcting self-consuming loops on the challenging human motion synthesis task, and observe that it successfully avoids model collapse, even when the ratio of synthetic data to real data is as high as 100%. △ Less

Submitted 10 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

Comments: Camera ready version (ICML 2024). Code at https://nategillman.com/sc-sc.html

arXiv:2312.12191 [pdf, other]

CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning

Authors: Chenyu Sun, Hangwei Qian, Chunyan Miao

Abstract: Offline reinforcement learning (RL) aims to learn an effective policy from a pre-collected dataset. Most existing works are to develop sophisticated learning algorithms, with less emphasis on improving the data collection process. Moreover, it is even challenging to extend the single-task setting and collect a task-agnostic dataset that allows an agent to perform multiple downstream tasks. In this… ▽ More Offline reinforcement learning (RL) aims to learn an effective policy from a pre-collected dataset. Most existing works are to develop sophisticated learning algorithms, with less emphasis on improving the data collection process. Moreover, it is even challenging to extend the single-task setting and collect a task-agnostic dataset that allows an agent to perform multiple downstream tasks. In this paper, we propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection and ultimately improve learning efficiency and capabilities for multi-task offline RL. To achieve this, CUDC estimates the probability of the k-step future states being reachable from the current states, and adapts how many steps into the future that the dynamics model should predict. With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity. Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted at AAAI-24

arXiv:2312.11927 [pdf, other]

Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery

Authors: Pengwei Yan, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Tianqianjin Lin, Changlong Sun, Xiaozhong Liu

Abstract: While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique… ▽ More While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique dual-level pretraining structure that orchestrates node-level and subgraph-level pretext tasks. Unlike prior approaches, DGPM autonomously uncovers significant graph motifs through an edge pooling module, aligning learned motif similarities with graph kernel-based similarities. A cross-matching task enables sophisticated node-motif interactions and novel representation learning. Extensive experiments on 15 datasets validate DGPM's effectiveness and generalizability, outperforming state-of-the-art methods in unsupervised representation learning and transfer learning settings. The autonomously discovered motifs demonstrate the potential of DGPM to enhance robustness and interpretability. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 14 pages, 6 figures, accepted by AAAI'24

arXiv:2312.05757 [pdf, ps, other]

doi 10.1016/j.ipm.2023.103600

Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph

Authors: Tianqianjin Lin, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Weikang Yuan, Xurui Li, Changlong Sun, Cui Huang, Xiaozhong Liu

Abstract: Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic… ▽ More Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic the human perception and decision process through two key steps: constructing intelligible variables based on semantics derived from the graph schema and automatically learning task-level causal relationships among these variables by incorporating advanced causal discovery techniques. We compared HG-SCM to seven state-of-the-art baseline models on three real-world datasets, under three distinct and ubiquitous out-of-distribution settings. HG-SCM achieved the highest average performance rank with minimal standard deviation, substantiating its effectiveness and superiority in terms of both predictive power and generalizability. Additionally, the visualization and analysis of the auto-learned causal diagrams for the three tasks aligned well with domain knowledge and human cognition, demonstrating prominent interpretability. HG-SCM's human-like nature and its enhanced generalizability and interpretability make it a promising solution for special scenarios where transparency and trustworthiness are paramount. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 28 pages, 10 figures, 6 tables, accepted by Information Processing & Management

Journal ref: Information Processing & Management, 60 (2024) 1-21

arXiv:2312.00308 [pdf, other]

A knowledge-based data-driven (KBDD) framework for all-day identification of cloud types using satellite remote sensing

Authors: Longfeng Nie, Yuntian Chen, Mengge Du, Changqi Sun, Dongxiao Zhang

Abstract: Cloud types, as a type of meteorological data, are of particular significance for evaluating changes in rainfall, heatwaves, water resources, floods and droughts, food security and vegetation cover, as well as land use. In order to effectively utilize high-resolution geostationary observations, a knowledge-based data-driven (KBDD) framework for all-day identification of cloud types based on spectr… ▽ More Cloud types, as a type of meteorological data, are of particular significance for evaluating changes in rainfall, heatwaves, water resources, floods and droughts, food security and vegetation cover, as well as land use. In order to effectively utilize high-resolution geostationary observations, a knowledge-based data-driven (KBDD) framework for all-day identification of cloud types based on spectral information from Himawari-8/9 satellite sensors is designed. And a novel, simple and efficient network, named CldNet, is proposed. Compared with widely used semantic segmentation networks, including SegNet, PSPNet, DeepLabV3+, UNet, and ResUnet, our proposed model CldNet with an accuracy of 80.89+-2.18% is state-of-the-art in identifying cloud types and has increased by 32%, 46%, 22%, 2%, and 39%, respectively. With the assistance of auxiliary information (e.g., satellite zenith/azimuth angle, solar zenith/azimuth angle), the accuracy of CldNet-W using visible and near-infrared bands and CldNet-O not using visible and near-infrared bands on the test dataset is 82.23+-2.14% and 73.21+-2.02%, respectively. Meanwhile, the total parameters of CldNet are only 0.46M, making it easy for edge deployment. More importantly, the trained CldNet without any fine-tuning can predict cloud types with higher spatial resolution using satellite spectral data with spatial resolution 0.02°*0.02°, which indicates that CldNet possesses a strong generalization ability. In aggregate, the KBDD framework using CldNet is a highly effective cloud-type identification system capable of providing a high-fidelity, all-day, spatiotemporal cloud-type database for many climate assessment fields. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2310.02423 [pdf, other]

Delta-AI: Local objectives for amortized inference in sparse graphical models

Authors: Jean-Pierre Falet, Hae Beom Lee, Nikolay Malkin, Chen Sun, Dragos Secrieru, Thomas Jiralerspong, Dinghuai Zhang, Guillaume Lajoie, Yoshua Bengio

Abstract: We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call $Δ$-amortized inference ($Δ$-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local… ▽ More We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call $Δ$-amortized inference ($Δ$-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local constraint that can be turned into a local loss in the style of generative flow networks (GFlowNets) that enables off-policy training but avoids the need to instantiate all the random variables for each parameter update, thus speeding up training considerably. The $Δ$-AI objective matches the conditional distribution of a variable given its Markov blanket in a tractable learned sampler, which has the structure of a Bayesian network, with the same conditional distribution under the target PGM. As such, the trained sampler recovers marginals and conditional distributions of interest and enables inference of partial subsets of variables. We illustrate $Δ$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure. △ Less

Submitted 13 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: ICLR 2024; 19 pages, code: https://github.com/GFNOrg/Delta-AI/

arXiv:2310.00817 [pdf, other]

Learning to Make Adherence-Aware Advice

Authors: Guanting Chen, Xiaocheng Li, Chunlin Sun, Hanzhao Wang

Abstract: As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a se… ▽ More As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance. △ Less

Submitted 20 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2307.04402 [pdf]

Moving pattern-based modeling using a new type of interval ARX model

Authors: Changping Sun

Abstract: In this paper,firstly,to overcome the shortcoming of traditional ARX model, a new operator between an interval number and a real matrix is defined, and then it is applied to the traditional ARX model to get a new type of structure interval ARX model that can deal with interval data, which is defined as interval ARX model (IARX). Secondly,the IARX model is applied to moving pattern-based modeling.… ▽ More In this paper,firstly,to overcome the shortcoming of traditional ARX model, a new operator between an interval number and a real matrix is defined, and then it is applied to the traditional ARX model to get a new type of structure interval ARX model that can deal with interval data, which is defined as interval ARX model (IARX). Secondly,the IARX model is applied to moving pattern-based modeling. Finally,to verify the validity of the proposed modeling method,it is applied to a sintering process. The simulation results show the moving pattern-based modeling using the new type of interval ARX model is robust to variation in parameters of the model, and the performance of the modeling using the proposed IARX is superior to that of the previous work. △ Less

Submitted 12 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

arXiv:2301.11260 [pdf, ps, other]

Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming

Authors: Chunlin Sun, Shang Liu, Xiaocheng Li

Abstract: In this paper, we study the predict-then-optimize problem where the output of a machine learning prediction task is used as the input of some downstream optimization problem, say, the objective coefficient vector of a linear program. The problem is also known as predictive analytics or contextual linear programming. The existing approaches largely suffer from either (i) optimization intractability… ▽ More In this paper, we study the predict-then-optimize problem where the output of a machine learning prediction task is used as the input of some downstream optimization problem, say, the objective coefficient vector of a linear program. The problem is also known as predictive analytics or contextual linear programming. The existing approaches largely suffer from either (i) optimization intractability (a non-convex objective function)/statistical inefficiency (a suboptimal generalization bound) or (ii) requiring strong condition(s) such as no constraint or loss calibration. We develop a new approach to the problem called \textit{maximum optimality margin} which designs the machine learning loss function by the optimality condition of the downstream optimization. The max-margin formulation enjoys both computational efficiency and good theoretical properties for the learning procedure. More importantly, our new approach only needs the observations of the optimal solution in the training data rather than the objective function, which makes it a new and natural approach to the inverse linear programming problem under both contextual and context-free settings; we also analyze the proposed method under both offline and online settings, and demonstrate its performance using numerical experiments. △ Less

Submitted 28 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: to be published in ICML 2023

arXiv:2205.00943 [pdf, other]

doi 10.24963/ijcai.2022/478

CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement Learning

Authors: Chenyu Sun, Hangwei Qian, Chunyan Miao

Abstract: In reinforcement learning (RL), it is challenging to learn directly from high-dimensional observations, where data augmentation has recently been shown to remedy this via encoding invariances from raw pixels. Nevertheless, we empirically find that not all samples are equally important and hence simply injecting more augmented inputs may instead cause instability in Q-learning. In this paper, we ap… ▽ More In reinforcement learning (RL), it is challenging to learn directly from high-dimensional observations, where data augmentation has recently been shown to remedy this via encoding invariances from raw pixels. Nevertheless, we empirically find that not all samples are equally important and hence simply injecting more augmented inputs may instead cause instability in Q-learning. In this paper, we approach this problem systematically by developing a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF), which can fully exploit sample importance and improve learning efficiency in a self-supervised manner. Facilitated by the proposed contrastive curiosity, CCLF is capable of prioritizing the experience replay, selecting the most informative augmented inputs, and more importantly regularizing the Q-function as well as the encoder to concentrate more on under-learned data. Moreover, it encourages the agent to explore with a curiosity-based reward. As a result, the agent can focus on more informative samples and learn representation invariances more efficiently, with significantly reduced augmented inputs. We apply CCLF to several base RL algorithms and evaluate on the DeepMind Control Suite, Atari, and MiniGrid benchmarks, where our approach demonstrates superior sample efficiency and learning performances compared with other state-of-the-art methods. △ Less

Submitted 3 May, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: Full paper with supplementary material, accepted by IJCAI 2022. Acknowledgements and affiliations are updated

arXiv:2203.04511 [pdf, other]

Revealing the Excitation Causality between Climate and Political Violence via a Neural Forward-Intensity Poisson Process

Authors: Schyler C. Sun, Bailu Jin, Zhuangkun Wei, Weisi Guo

Abstract: The causal mechanism between climate and political violence is fraught with complex mechanisms. Current quantitative causal models rely on one or more assumptions: (1) the climate drivers persistently generate conflict, (2) the causal mechanisms have a linear relationship with the conflict generation parameter, and/or (3) there is sufficient data to inform the prior distribution. Yet, we know conf… ▽ More The causal mechanism between climate and political violence is fraught with complex mechanisms. Current quantitative causal models rely on one or more assumptions: (1) the climate drivers persistently generate conflict, (2) the causal mechanisms have a linear relationship with the conflict generation parameter, and/or (3) there is sufficient data to inform the prior distribution. Yet, we know conflict drivers often excite a social transformation process which leads to violence (e.g., drought forces agricultural producers to join urban militia), but further climate effects do not necessarily contribute to further violence. Therefore, not only is this bifurcation relationship highly non-linear, there is also often a lack of data to support prior assumptions for high resolution modeling. Here, we aim to overcome the aforementioned causal modeling challenges by proposing a neural forward-intensity Poisson process (NFIPP) model. The NFIPP is designed to capture the potential non-linear causal mechanism in climate induced political violence, whilst being robust to sparse and timing-uncertain data. Our results span 20 recent years and reveal an excitation-based causal link between extreme climate events and political violence across diverse countries. Our climate-induced conflict model results are cross-validated against qualitative climate vulnerability indices. Furthermore, we label historical events that either improve or reduce our predictability gain, demonstrating the importance of domain expertise in informing interpretation. △ Less

Submitted 8 March, 2022; originally announced March 2022.

arXiv:2201.13259 [pdf, other]

Trajectory balance: Improved credit assignment in GFlowNets

Authors: Nikolay Malkin, Moksh Jain, Emmanuel Bengio, Chen Sun, Yoshua Bengio

Abstract: Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, which are analogous to tempo… ▽ More Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, which are analogous to temporal difference learning, to be prone to inefficient credit propagation across long action sequences. We thus propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives. We prove that any global minimizer of the trajectory balance objective can define a policy that samples exactly from the target distribution. In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces. △ Less

Submitted 4 October, 2023; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: NeurIPS 2022; see footnotes for code; v3 fixes minor errata

arXiv:2110.05428 [pdf, other]

Learning Temporally Causal Latent Processes from General Temporal Data

Authors: Weiran Yao, Yuewen Sun, Alex Ho, Changyin Sun, Kun Zhang

Abstract: Our goal is to recover time-delayed latent causal variables and identify their relations from measured temporal data. Estimating causally-related latent variables from observations is particularly challenging as the latent variables are not uniquely recoverable in the most general case. In this work, we consider both a nonparametric, nonstationary setting and a parametric setting for the latent pr… ▽ More Our goal is to recover time-delayed latent causal variables and identify their relations from measured temporal data. Estimating causally-related latent variables from observations is particularly challenging as the latent variables are not uniquely recoverable in the most general case. In this work, we consider both a nonparametric, nonstationary setting and a parametric setting for the latent processes and propose two provable conditions under which temporally causal latent processes can be identified from their nonlinear mixtures. We propose LEAP, a theoretically-grounded framework that extends Variational AutoEncoders (VAEs) by enforcing our conditions through proper constraints in causal process prior. Experimental results on various datasets demonstrate that temporally causal latent processes are reliably identified from observed variables under different dependency structures and that our approach considerably outperforms baselines that do not properly leverage history or nonstationarity information. This demonstrates that using temporal information to learn latent processes from their invertible nonlinear mixtures in an unsupervised manner, for which we believe our work is one of the first, seems promising even without sparsity or minimality assumptions. △ Less

Submitted 8 February, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: ICLR 2022: https://openreview.net/forum?id=RDlLMjLJXdq

arXiv:2104.14281 [pdf]

doi 10.1177/20552076221089092

Leveraging Online Shopping Behaviors as a Proxy for Personal Lifestyle Choices: New Insights into Chronic Disease Prevention Literacy

Authors: Yongzhen Wang, Xiaozhong Liu, Katy Börner, Jun Lin, Yingnan Ju, Changlong Sun, Luo Si

Abstract: Objective: Ubiquitous internet access is reshaping the way we live, but it is accompanied by unprecedented challenges in preventing chronic diseases that are usually planted by long exposure to unhealthy lifestyles. This paper proposes leveraging online shopping behaviors as a proxy for personal lifestyle choices to improve chronic disease prevention literacy, targeted for times when e-commerce us… ▽ More Objective: Ubiquitous internet access is reshaping the way we live, but it is accompanied by unprecedented challenges in preventing chronic diseases that are usually planted by long exposure to unhealthy lifestyles. This paper proposes leveraging online shopping behaviors as a proxy for personal lifestyle choices to improve chronic disease prevention literacy, targeted for times when e-commerce user experience has been assimilated into most people's everyday lives. Methods: Longitudinal query logs and purchase records from 15 million online shoppers were accessed, constructing a broad spectrum of lifestyle features covering various product categories and buyer personas. Using the lifestyle-related information preceding online shoppers' first purchases of specific prescription drugs, we could determine associations between their past lifestyle choices and whether they suffered from a particular chronic disease. Results: Novel lifestyle risk factors were discovered in two exemplars--depression and type 2 diabetes, most of which showed reasonable consistency with existing healthcare knowledge. Further, such empirical findings could be adopted to locate online shoppers at higher risk of these chronic diseases with decent accuracy [i.e., (area under the receiver operating characteristic curve) AUC=0.68 for depression and AUC=0.70 for type 2 diabetes], closely matching the performance of screening surveys benchmarked against medical diagnosis. Conclusions: Mining online shopping behaviors can point medical experts to a series of lifestyle issues associated with chronic diseases that are less explored to date. Hopefully, unobtrusive chronic disease surveillance via e-commerce sites can grant consenting individuals a privilege to be connected more readily with the medical profession and sophistication. △ Less

Submitted 9 March, 2022; v1 submitted 29 April, 2021; originally announced April 2021.

Comments: 58 pages with appendices, 5 figures, 17 tables

arXiv:2010.12493 [pdf, other]

A Review of Deep Learning Methods for Irregularly Sampled Medical Time Series Data

Authors: Chenxi Sun, Shenda Hong, Moxian Song, Hongyan Li

Abstract: Irregularly sampled time series (ISTS) data has irregular temporal intervals between observations and different sampling rates between sequences. ISTS commonly appears in healthcare, economics, and geoscience. Especially in the medical environment, the widely used Electronic Health Records (EHRs) have abundant typical irregularly sampled medical time series (ISMTS) data. Developing deep learning m… ▽ More Irregularly sampled time series (ISTS) data has irregular temporal intervals between observations and different sampling rates between sequences. ISTS commonly appears in healthcare, economics, and geoscience. Especially in the medical environment, the widely used Electronic Health Records (EHRs) have abundant typical irregularly sampled medical time series (ISMTS) data. Developing deep learning methods on EHRs data is critical for personalized treatment, precise diagnosis and medical management. However, it is challenging to directly use deep learning models for ISMTS data. On the one hand, ISMTS data has the intra-series and inter-series relations. Both the local and global structures should be considered. On the other hand, methods should consider the trade-off between task accuracy and model complexity and remain generality and interpretability. So far, many existing works have tried to solve the above problems and have achieved good results. In this paper, we review these deep learning methods from the perspectives of technology and task. Under the technology-driven perspective, we summarize them into two categories - missing data-based methods and raw data-based methods. Under the task-driven perspective, we also summarize them into two categories - data imputation-oriented and downstream task-oriented. For each of them, we point out their advantages and disadvantages. Moreover, we implement some representative methods and compare them on four medical datasets with two tasks. Finally, we discuss the challenges and opportunities in this area. △ Less

Submitted 26 October, 2020; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: 19 pages, 7 figures

arXiv:2008.11922 [pdf, other]

Time-based Sequence Model for Personalization and Recommendation Systems

Authors: Tigran Ishkhanov, Maxim Naumov, Xianjie Chen, Yan Zhu, Yuan Zhong, Alisson Gusatti Azzolini, Chonglin Sun, Frank Jiang, Andrey Malevich, Liang Xiong

Abstract: In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study t… ▽ More In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study the properties of our state-of-the-art model on statistically designed data set. Also, we show that it outperforms more complex models with longer sequence length on the Taobao User Behavior dataset. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: 17 pages, 7 figures

MSC Class: 68T05 ACM Class: I.2.6; I.5.0; H.3.3; H.3.4

arXiv:2006.11419 [pdf, other]

FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimize

Authors: Chuangchuang Sun, Dong-Ki Kim, Jonathan P. How

Abstract: This paper investigates reinforcement learning with constraints, which are indispensable in safety-critical environments. To drive the constraint violation monotonically decrease, we take the constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics. As a result, the original safety set can be forward-invariant. However, because the new guarant… ▽ More This paper investigates reinforcement learning with constraints, which are indispensable in safety-critical environments. To drive the constraint violation monotonically decrease, we take the constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics. As a result, the original safety set can be forward-invariant. However, because the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable. To address this, we propose to learn a generic deep neural network (DNN)-based optimizer to optimize the objective while satisfying the linear constraints. The constraint-satisfaction is achieved via projection onto a polytope formulated by multiple linear inequality constraints, which can be solved analytically with our newly designed metric. To the best of our knowledge, this is the \textit{first} DNN-based optimizer for constrained optimization with the forward invariance guarantee. We show that our optimizer trains a policy to decrease the constraint violation and maximize the cumulative reward monotonically. Results on numerical constrained optimization and obstacle-avoidance navigation validate the theoretical findings. △ Less

Submitted 5 May, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

Comments: Accepted to ICML 2020 Workshop Theoretical Foundations of RL; Accepted to ICRA 2021

arXiv:2006.06057 [pdf, other]

Scalable Partial Explainability in Neural Networks via Flexible Activation Functions

Authors: Schyler C. Sun, Chen Li, Zhuangkun Wei, Antonios Tsourdos, Weisi Guo

Abstract: Achieving transparency in black-box deep learning algorithms is still an open challenge. High dimensional features and decisions given by deep neural networks (NN) require new algorithms and methods to expose its mechanisms. Current state-of-the-art NN interpretation methods (e.g. Saliency maps, DeepLIFT, LIME, etc.) focus more on the direct relationship between NN outputs and inputs rather than t… ▽ More Achieving transparency in black-box deep learning algorithms is still an open challenge. High dimensional features and decisions given by deep neural networks (NN) require new algorithms and methods to expose its mechanisms. Current state-of-the-art NN interpretation methods (e.g. Saliency maps, DeepLIFT, LIME, etc.) focus more on the direct relationship between NN outputs and inputs rather than the NN structure and operations itself. In current deep NN operations, there is uncertainty over the exact role played by neurons with fixed activation functions. In this paper, we achieve partially explainable learning model by symbolically explaining the role of activation functions (AF) under a scalable topology. This is carried out by modeling the AFs as adaptive Gaussian Processes (GP), which sit within a novel scalable NN topology, based on the Kolmogorov-Arnold Superposition Theorem (KST). In this scalable NN architecture, the AFs are generated by GP interpolation between control points and can thus be tuned during the back-propagation procedure via gradient descent. The control points act as the core enabler to both local and global adjustability of AF, where the GP interpolation constrains the intrinsic autocorrelation to avoid over-fitting. We show that there exists a trade-off between the NN's expressive power and interpretation complexity, under linear KST topology scaling. To demonstrate this, we perform a case study on a binary classification dataset of banknote authentication. By quantitatively and qualitatively investigating the mapping relationship between inputs and output, our explainable model can provide interpretation over each of the one-dimensional attributes. These early results suggest that our model has the potential to act as the final interpretation layer for deep neural networks. △ Less

Submitted 10 June, 2020; originally announced June 2020.

arXiv:2006.05859 [pdf]

Trading Privacy for the Greater Social Good: How Did America React During COVID-19?

Authors: Anindya Ghose, Beibei Li, Meghanath Macha, Chenshuo Sun, Natasha Ying Zhang Foutz

Abstract: Digital contact tracing and analysis of social distancing from smartphone location data are two prime examples of non-therapeutic interventions used in many countries to mitigate the impact of the COVID-19 pandemic. While many understand the importance of trading personal privacy for the public good, others have been alarmed at the potential for surveillance via measures enabled through location t… ▽ More Digital contact tracing and analysis of social distancing from smartphone location data are two prime examples of non-therapeutic interventions used in many countries to mitigate the impact of the COVID-19 pandemic. While many understand the importance of trading personal privacy for the public good, others have been alarmed at the potential for surveillance via measures enabled through location tracking on smartphones. In our research, we analyzed massive yet atomic individual-level location data containing over 22 billion records from ten Blue (Democratic) and ten Red (Republican) cities in the U.S., based on which we present, herein, some of the first evidence of how Americans responded to the increasing concerns that government authorities, the private sector, and public health experts might use individual-level location data to track the COVID-19 spread. First, we found a significant decreasing trend of mobile-app location-sharing opt-out. Whereas areas with more Democrats were more privacy-concerned than areas with more Republicans before the advent of the COVID-19 pandemic, there was a significant decrease in the overall opt-out rates after COVID-19, and this effect was more salient among Democratic than Republican cities. Second, people who practiced social distancing (i.e., those who traveled less and interacted with fewer close contacts during the pandemic) were also less likely to opt-out, whereas the converse was true for people who practiced less social-distancing. This relationship also was more salient among Democratic than Republican cities. Third, high-income populations and males, compared with low-income populations and females, were more privacy-conscientious and more likely to opt-out of location tracking. △ Less

Submitted 10 June, 2020; originally announced June 2020.

arXiv:2005.04259 [pdf, other]

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

Authors: Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, Cordelia Schmid

Abstract: Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial… ▽ More Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset. △ Less

Submitted 8 May, 2020; originally announced May 2020.

Comments: CVPR 2020

arXiv:2004.13970 [pdf, other]

Directed Graph Convolutional Network

Authors: Zekun Tong, Yuxuan Liang, Changsheng Sun, David S. Rosenblum, Andrew Lim

Abstract: Graph Convolutional Networks (GCNs) have been widely used due to their outstanding performance in processing graph-structured data. However, the undirected graphs limit their application scope. In this paper, we extend spectral-based graph convolution to directed graphs by using first- and second-order proximity, which can not only retain the connection properties of the directed graph, but also e… ▽ More Graph Convolutional Networks (GCNs) have been widely used due to their outstanding performance in processing graph-structured data. However, the undirected graphs limit their application scope. In this paper, we extend spectral-based graph convolution to directed graphs by using first- and second-order proximity, which can not only retain the connection properties of the directed graph, but also expand the receptive field of the convolution operation. A new GCN model, called DGCN, is then designed to learn representations on the directed graph, leveraging both the first- and second-order proximity information. We empirically show the fact that GCNs working only with DGCNs can encode more useful information from graph and help achieve better performance when generalized to other models. Moreover, extensive experiments on citation networks and co-purchase datasets demonstrate the superiority of our model against the state-of-the-art methods. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:2003.04994 [pdf, ps, other]

Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning

Authors: Tianyi Wang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Qiong Zhang

Abstract: Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc. While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive. In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks wh… ▽ More Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc. While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive. In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks where the training objectives are given naturally according to the nature of the utterance and the structure of the multi-role conversation. Meanwhile, in order to locate essential information for dialogue summarization/extraction, the pretraining process enables external knowledge integration. The proposed fine-tuned pretraining mechanism is comprehensively evaluated via three different dialogue datasets along with a number of downstream dialogue-mining tasks. Result shows that the proposed pretraining mechanism significantly contributes to all the downstream tasks without discrimination to different encoders. △ Less

Submitted 26 February, 2020; originally announced March 2020.

Comments: 8 pages, 4 figures, AAAI2020

arXiv:2003.03609 [pdf, other]

RCC-Dual-GAN: An Efficient Approach for Outlier Detection with Few Identified Anomalies

Authors: Zhe Li, Chunhua Sun, Chunli Liu, Xiayu Chen, Meng Wang, Yezheng Liu

Abstract: Outlier detection is an important task in data mining and many technologies have been explored in various applications. However, due to the default assumption that outliers are non-concentrated, unsupervised outlier detection may not correctly detect group anomalies with higher density levels. As for the supervised outlier detection, although high detection rates and optimal parameters can usually… ▽ More Outlier detection is an important task in data mining and many technologies have been explored in various applications. However, due to the default assumption that outliers are non-concentrated, unsupervised outlier detection may not correctly detect group anomalies with higher density levels. As for the supervised outlier detection, although high detection rates and optimal parameters can usually be achieved, obtaining sufficient and correct labels is a time-consuming task. To address these issues, we focus on semi-supervised outlier detection with few identified anomalies, in the hope of using limited labels to achieve high detection accuracy. First, we propose a novel detection model Dual-GAN, which can directly utilize the potential information in identified anomalies to detect discrete outliers and partially identified group anomalies simultaneously. And then, considering the instances with similar output values may not all be similar in a complex data structure, we replace the two MO-GAN components in Dual-GAN with the combination of RCC and M-GAN (RCC-Dual-GAN). In addition, to deal with the evaluation of Nash equilibrium and the selection of optimal model, two evaluation indicators are created and introduced into the two models to make the detection process more intelligent. Extensive experiments on both benchmark datasets and two practical tasks demonstrate that our proposed approaches (i.e., Dual-GAN and RCC-Dual-GAN) can significantly improve the accuracy of outlier detection even with only a few identified anomalies. Moreover, compared with the two MO-GAN components in Dual-GAN, the network structure combining RCC and M-GAN has greater stability in various situations. △ Less

Submitted 7 March, 2020; originally announced March 2020.

arXiv:2001.07631 [pdf, other]

HRFA: High-Resolution Feature-based Attack

Authors: Zhixing Ye, Sizhe Chen, Peidong Zhang, Chengjin Sun, Xiaolin Huang

Abstract: Adversarial attacks have long been developed for revealing the vulnerability of Deep Neural Networks (DNNs) by adding imperceptible perturbations to the input. Most methods generate perturbations like normal noise, which is not interpretable and without semantic meaning. In this paper, we propose High-Resolution Feature-based Attack (HRFA), yielding authentic adversarial examples with up to… ▽ More Adversarial attacks have long been developed for revealing the vulnerability of Deep Neural Networks (DNNs) by adding imperceptible perturbations to the input. Most methods generate perturbations like normal noise, which is not interpretable and without semantic meaning. In this paper, we propose High-Resolution Feature-based Attack (HRFA), yielding authentic adversarial examples with up to $1024 \times 1024$ resolution. HRFA exerts attack by modifying the latent feature representation of the image, i.e., the gradients back propagate not only through the victim DNN, but also through the generative model that maps the feature space to the image space. In this way, HRFA generates adversarial examples that are in high-resolution, realistic, noise-free, and hence is able to evade several denoising-based defenses. In the experiment, the effectiveness of HRFA is validated by attacking the object classification and face verification tasks with BigGAN and StyleGAN, respectively. The advantages of HRFA are verified from the high quality, high authenticity, and high attack success rate faced with defenses. △ Less

Submitted 22 October, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

arXiv:2001.06325 [pdf, other]

Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet

Authors: Sizhe Chen, Zhengbao He, Chengjin Sun, Jie Yang, Xiaolin Huang

Abstract: Adversarial attacks on deep neural networks (DNNs) have been found for several years. However, the existing adversarial attacks have high success rates only when the information of the victim DNN is well-known or could be estimated by the structure similarity or massive queries. In this paper, we propose to Attack on Attention (AoA), a semantic property commonly shared by DNNs. AoA enjoys a signif… ▽ More Adversarial attacks on deep neural networks (DNNs) have been found for several years. However, the existing adversarial attacks have high success rates only when the information of the victim DNN is well-known or could be estimated by the structure similarity or massive queries. In this paper, we propose to Attack on Attention (AoA), a semantic property commonly shared by DNNs. AoA enjoys a significant increase in transferability when the traditional cross entropy loss is replaced with the attention loss. Since AoA alters the loss function only, it could be easily combined with other transferability-enhancement techniques and then achieve SOTA performance. We apply AoA to generate 50000 adversarial samples from ImageNet validation set to defeat many neural networks, and thus name the dataset as DAmageNet. 13 well-trained DNNs are tested on DAmageNet, and all of them have an error rate over 85%. Even with defenses or adversarial training, most models still maintain an error rate over 70% on DAmageNet. DAmageNet is the first universal adversarial dataset. It could be downloaded freely and serve as a benchmark for robustness testing and adversarial training. △ Less

Submitted 21 October, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

Comments: accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:2001.00784 [pdf, other]

Optimizing Wireless Systems Using Unsupervised and Reinforced-Unsupervised Deep Learning

Authors: Dong Liu, Chengjian Sun, Chenyang Yang, Lajos Hanzo

Abstract: Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems subject to specific constraints, which can be formulated as variable or functional optimization. If the objective and constraint functions of a variable optimization problem can be derived, standard numerical algorithms can be applied for finding the optimal solution, which however incur… ▽ More Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems subject to specific constraints, which can be formulated as variable or functional optimization. If the objective and constraint functions of a variable optimization problem can be derived, standard numerical algorithms can be applied for finding the optimal solution, which however incur high computational cost when the dimension of the variable is high. To reduce the on-line computational complexity, learning the optimal solution as a function of the environment's status by deep neural networks (DNNs) is an effective approach. DNNs can be trained under the supervision of optimal solutions, which however, is not applicable to the scenarios without models or for functional optimization where the optimal solutions are hard to obtain. If the objective and constraint functions are unavailable, reinforcement learning can be applied to find the solution of a functional optimization problem, which is however not tailored to optimization problems in wireless networks. In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems without the supervision of the optimal solutions. When the mathematical model of the environment is completely known and the distribution of environment's status is known or unknown, we can invoke unsupervised learning algorithm. When the mathematical model of the environment is incomplete, we introduce reinforced-unsupervised learning algorithms that learn the model by interacting with the environment. Our simulation results confirm the applicability of these learning frameworks by taking a user association problem as an example. △ Less

Submitted 3 January, 2020; originally announced January 2020.

Comments: To appear in IEEE Network Magazine

arXiv:1912.07160 [pdf, other]

DAmageNet: A Universal Adversarial Dataset

Authors: Sizhe Chen, Xiaolin Huang, Zhengbao He, Chengjin Sun

Abstract: It is now well known that deep neural networks (DNNs) are vulnerable to adversarial attack. Adversarial samples are similar to the clean ones, but are able to cheat the attacked DNN to produce incorrect predictions in high confidence. But most of the existing adversarial attacks have high success rate only when the information of the attacked DNN is well-known or could be estimated by massive quer… ▽ More It is now well known that deep neural networks (DNNs) are vulnerable to adversarial attack. Adversarial samples are similar to the clean ones, but are able to cheat the attacked DNN to produce incorrect predictions in high confidence. But most of the existing adversarial attacks have high success rate only when the information of the attacked DNN is well-known or could be estimated by massive queries. A promising way is to generate adversarial samples with high transferability. By this way, we generate 96020 transferable adversarial samples from original ones in ImageNet. The average difference, measured by root means squared deviation, is only around 3.8 on average. However, the adversarial samples are misclassified by various models with an error rate up to 90\%. Since the images are generated independently with the attacked DNNs, this is essentially zero-query adversarial attack. We call the dataset \emph{DAmageNet}, which is the first universal adversarial dataset that beats many models trained in ImageNet. By finding the drawbacks, DAmageNet could serve as a benchmark to study and improve robustness of DNNs. DAmageNet could be downloaded in http://www.pami.sjtu.edu.cn/Show/56/122. △ Less

Submitted 15 December, 2019; originally announced December 2019.

arXiv:1911.03183 [pdf, other]

Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent

Authors: Erik-Jan van Kesteren, Chang Sun, Daniel L. Oberski, Michel Dumontier, Lianne Ippel

Abstract: Combining data from varied sources has considerable potential for knowledge discovery: collaborating data parties can mine data in an expanded feature space, allowing them to explore a larger range of scientific questions. However, data sharing among different parties is highly restricted by legal conditions, ethical concerns, and / or data volume. Fueled by these concerns, the fields of cryptogra… ▽ More Combining data from varied sources has considerable potential for knowledge discovery: collaborating data parties can mine data in an expanded feature space, allowing them to explore a larger range of scientific questions. However, data sharing among different parties is highly restricted by legal conditions, ethical concerns, and / or data volume. Fueled by these concerns, the fields of cryptography and distributed learning have made great progress towards privacy-preserving and distributed data mining. However, practical implementations have been hampered by the limited scope or computational complexity of these methods. In this paper, we greatly extend the range of analyses available for vertically partitioned data, i.e., data collected by separate parties with different features on the same subjects. To this end, we present a novel approach for privacy-preserving generalized linear models, a fundamental and powerful framework underlying many prediction and classification procedures. We base our method on a distributed block coordinate descent algorithm to obtain parameter estimates, and we develop an extension to compute accurate standard errors without additional communication cost. We critically evaluate the information transfer for semi-honest collaborators and show that our protocol is secure against data reconstruction. Through both simulated and real-world examples we illustrate the functionality of our proposed algorithm. Without leaking information, our method performs as well on vertically partitioned data as existing methods on combined data -- all within mere minutes of computation time. We conclude that our method is a viable approach for vertically partitioned data analysis with a wide range of real-world applications. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Comments: Fully reproducible code for all results and images can be found at https://github.com/vankesteren/privacy-preserving-glm, and the software package can be found at https://github.com/vankesteren/privreg

arXiv:1907.12706 [pdf, other]

Model-Free Unsupervised Learning for Optimization Problems with Constraints

Authors: Chengjian Sun, Dong Liu, Chenyang Yang

Abstract: In many optimization problems in wireless communications, the expressions of objective function or constraints are hard or even impossible to derive, which makes the solutions difficult to find. In this paper, we propose a model-free learning framework to solve constrained optimization problems without the supervision of the optimal solution. Neural networks are used respectively for parameterizin… ▽ More In many optimization problems in wireless communications, the expressions of objective function or constraints are hard or even impossible to derive, which makes the solutions difficult to find. In this paper, we propose a model-free learning framework to solve constrained optimization problems without the supervision of the optimal solution. Neural networks are used respectively for parameterizing the function to be optimized, parameterizing the Lagrange multiplier associated with instantaneous constraints, and approximating the unknown objective function or constraints. We provide learning algorithms to train all the neural networks simultaneously, and reveal the connections of the proposed framework with reinforcement learning. Numerical and simulation results validate the proposed framework and demonstrate the efficiency of model-free learning by taking power control problem as an example. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Comments: Submitted to Asia-Pacific Conference on Communications (APCC)

arXiv:1906.05743 [pdf, other]

Learning Video Representations using Contrastive Bidirectional Transformer

Authors: Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid

Abstract: This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods. Our method extends the BERT model for text sequences to the case of sequences of real-valued feature vectors, by replacing the softmax loss with noise contrastive e… ▽ More This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods. Our method extends the BERT model for text sequences to the case of sequences of real-valued feature vectors, by replacing the softmax loss with noise contrastive estimation (NCE). We also show how to learn representations from sequences of visual features and sequences of words derived from ASR (automatic speech recognition), and show that such cross-modal training (when possible) helps even more. △ Less

Submitted 27 September, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

arXiv:1905.13014 [pdf, ps, other]

Unsupervised Deep Learning for Ultra-reliable and Low-latency Communications

Authors: Chengjian Sun, Chenyang Yang

Abstract: In this paper, we study how to solve resource allocation problems in ultra-reliable and low-latency communications by unsupervised deep learning, which often yield functional optimization problems with quality-of-service (QoS) constraints. We take a joint power and bandwidth allocation problem as an example, which minimizes the total bandwidth required to guarantee the QoS of each user in terms of… ▽ More In this paper, we study how to solve resource allocation problems in ultra-reliable and low-latency communications by unsupervised deep learning, which often yield functional optimization problems with quality-of-service (QoS) constraints. We take a joint power and bandwidth allocation problem as an example, which minimizes the total bandwidth required to guarantee the QoS of each user in terms of the delay bound and overall packet loss probability. The global optimal solution is found in a symmetric scenario. A neural network was introduced to find an approximated optimal solution in general scenarios, where the QoS is ensured by using the property that the optimal solution should satisfy as the "supervision signal". Simulation results show that the learning-based solution performs the same as the optimal solution in the symmetric scenario, and can save around 40% bandwidth with respect to the state-of-the-art policy. △ Less

Submitted 5 June, 2019; v1 submitted 25 April, 2019; originally announced May 2019.

Comments: 6 pages, 1 figure, submitted to IEEE for possible publication. arXiv admin note: text overlap with arXiv:1905.11017

arXiv:1905.11017 [pdf, ps, other]

Learning to Optimize with Unsupervised Learning: Training Deep Neural Networks for URLLC

Authors: Chengjian Sun, Chenyang Yang

Abstract: Learning the optimized solution as a function of environmental parameters is effective in solving numerical optimization in real time for time-sensitive applications. Existing works of learning to optimize train deep neural networks (DNN) with labels, and the learnt solution are inaccurate, which cannot be employed to ensure the stringent quality of service. In this paper, we propose a framework t… ▽ More Learning the optimized solution as a function of environmental parameters is effective in solving numerical optimization in real time for time-sensitive applications. Existing works of learning to optimize train deep neural networks (DNN) with labels, and the learnt solution are inaccurate, which cannot be employed to ensure the stringent quality of service. In this paper, we propose a framework to learn the latent function with unsupervised deep learning, where the property that the optimal solution should satisfy is used as the "supervision signal" implicitly. The framework is applicable to both functional and variable optimization problems with constraints. We take a variable optimization problem in ultra-reliable and low-latency communications as an example, which demonstrates that the ultra-high reliability can be supported by the DNN without supervision labels. △ Less

Submitted 27 May, 2019; originally announced May 2019.

Comments: 7 pages, 1 figure, submitted to IEEE for possible publication

arXiv:1905.06744 [pdf, other]

Forecasting Wireless Demand with Extreme Values using Feature Embedding in Gaussian Processes

Authors: Chengyao Sun, Weisi Guo

Abstract: Wireless traffic prediction is a fundamental enabler to proactive network optimisation in beyond 5G. Forecasting extreme demand spikes and troughs due to traffic mobility is essential to avoiding outages and improving energy efficiency. Current state-of-the-art deep learning forecasting methods predominantly focus on overall forecast performance and do not offer probabilistic uncertainty quantific… ▽ More Wireless traffic prediction is a fundamental enabler to proactive network optimisation in beyond 5G. Forecasting extreme demand spikes and troughs due to traffic mobility is essential to avoiding outages and improving energy efficiency. Current state-of-the-art deep learning forecasting methods predominantly focus on overall forecast performance and do not offer probabilistic uncertainty quantification (UQ). Whilst Gaussian Process (GP) models have UQ capability, it is not able to predict extreme values very well. Here, we design a feature embedding (FE) kernel for a GP model to forecast traffic demand with extreme values. Using real 4G base station data, we compare our FE-GP performance against both conventional naive GPs, ARIMA models, as well as demonstrate the UQ output. For short-term extreme value prediction, we demonstrated a 32\% reduction vs. S-ARIMA and 17\% reduction vs. Naive-GP. For long-term average value prediction, we demonstrated a 21\% reduction vs. S-ARIMA and 12\% reduction vs. Naive-GP. The FE kernel also enabled us to create a flexible trade-off between overall forecast accuracy against peak-trough accuracy. The advantage over neural network (e.g. CNN, LSTM) is that the probabilistic forecast uncertainty can inform us of the risk of predictions, as well as the full posterior distribution of the forecast. △ Less

Submitted 1 November, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

arXiv:1902.09641 [pdf, other]

Stochastic Prediction of Multi-Agent Interactions from Partial Observations

Authors: Chen Sun, Per Karlsson, Jiajun Wu, Joshua B Tenenbaum, Kevin Murphy

Abstract: We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents. Our method is based on a graph-structured variational recurrent neural network (Graph-VRNN), which is trained end-to-end to infer the current state of the (partially observed) world, as well as to for… ▽ More We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents. Our method is based on a graph-structured variational recurrent neural network (Graph-VRNN), which is trained end-to-end to infer the current state of the (partially observed) world, as well as to forecast future states. We show that our method outperforms various baselines on two sports datasets, one based on real basketball trajectories, and one generated by a soccer game engine. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: ICLR 2019 camera ready

arXiv:1812.02035 [pdf, other]

Stochastic Model Pruning via Weight Dropping Away and Back

Authors: Haipeng Jia, Xueshuang Xiang, Da Fan, Meiyu Huang, Changhao Sun, Yang He

Abstract: Deep neural networks have dramatically achieved great success on a variety of challenging tasks. However, most successful DNNs have an extremely complex structure, leading to extensive research on model compression.As a significant area of progress in model compression, traditional gradual pruning approaches involve an iterative prune-retrain procedure and may suffer from two critical issues: loca… ▽ More Deep neural networks have dramatically achieved great success on a variety of challenging tasks. However, most successful DNNs have an extremely complex structure, leading to extensive research on model compression.As a significant area of progress in model compression, traditional gradual pruning approaches involve an iterative prune-retrain procedure and may suffer from two critical issues: local importance judgment, where the pruned weights are merely unimportant in the current model; and an irretrievable pruning process, where the pruned weights have no chance to come back. Addressing these two issues, this paper proposes the Drop Pruning approach, which leverages stochastic optimization in the pruning process by introducing a drop strategy at each pruning step, namely, drop away, which stochastically deletes some unimportant weights, and drop back, which stochastically recovers some pruned weights. The suitable choice of drop probabilities decreases the model size during the pruning process and helps it flow to the target sparsity. Compared to the Bayesian approaches that stochastically train a compact model for pruning, we directly aim at stochastic gradual pruning. We provide a detailed analysis showing that the drop away and drop back approaches have individual contributions. Moreover, Drop Pruning can achieve competitive compression performance and accuracy on many benchmark tasks compared with state-of-the-art weights pruning and Bayesian training approaches. △ Less

Submitted 9 April, 2020; v1 submitted 5 December, 2018; originally announced December 2018.

arXiv:1809.00594 [pdf, other]

Adversarial Attack Type I: Cheat Classifiers by Significant Changes

Authors: Sanli Tang, Xiaolin Huang, Mingjian Chen, Chengjin Sun, Jie Yang

Abstract: Despite the great success of deep neural networks, the adversarial attack can cheat some well-trained classifiers by small permutations. In this paper, we propose another type of adversarial attack that can cheat classifiers by significant changes. For example, we can significantly change a face but well-trained neural networks still recognize the adversarial and the original example as the same p… ▽ More Despite the great success of deep neural networks, the adversarial attack can cheat some well-trained classifiers by small permutations. In this paper, we propose another type of adversarial attack that can cheat classifiers by significant changes. For example, we can significantly change a face but well-trained neural networks still recognize the adversarial and the original example as the same person. Statistically, the existing adversarial attack increases Type II error and the proposed one aims at Type I error, which are hence named as Type II and Type I adversarial attack, respectively. The two types of attack are equally important but are essentially different, which are intuitively explained and numerically evaluated. To implement the proposed attack, a supervised variation autoencoder is designed and then the classifier is attacked by updating the latent variables using gradient information. {Besides, with pre-trained generative models, Type I attack on latent spaces is investigated as well.} Experimental results show that our method is practical and effective to generate Type I adversarial examples on large-scale image datasets. Most of these generated examples can pass detectors designed for defending Type II attack and the strengthening strategy is only efficient with a specific type attack, both implying that the underlying reasons for Type I and Type II attack are different. △ Less

Submitted 22 July, 2019; v1 submitted 3 September, 2018; originally announced September 2018.

Showing 1–36 of 36 results for author: Sun, C