-
SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning
Authors:
Shuai Zhang,
Heshan Devaka Fernando,
Miao Liu,
Keerthiram Murugesan,
Songtao Lu,
Pin-Yu Chen,
Tianyi Chen,
Meng Wang
Abstract:
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specif…
▽ More
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Generative Modeling for Tabular Data via Penalized Optimal Transport Network
Authors:
Wenhui Sophia Lu,
Chenyang Zhong,
Wing Hung Wong
Abstract:
The task of precisely learning the probability distribution of rows within tabular data and producing authentic synthetic samples is both crucial and non-trivial. Wasserstein generative adversarial network (WGAN) marks a notable improvement in generative modeling, addressing the challenges faced by its predecessor, generative adversarial network. However, due to the mixed data types and multimodal…
▽ More
The task of precisely learning the probability distribution of rows within tabular data and producing authentic synthetic samples is both crucial and non-trivial. Wasserstein generative adversarial network (WGAN) marks a notable improvement in generative modeling, addressing the challenges faced by its predecessor, generative adversarial network. However, due to the mixed data types and multimodalities prevalent in tabular data, the delicate equilibrium between the generator and discriminator, as well as the inherent instability of Wasserstein distance in high dimensions, WGAN often fails to produce high-fidelity samples. To this end, we propose POTNet (Penalized Optimal Transport Network), a generative deep neural network based on a novel, robust, and interpretable marginally-penalized Wasserstein (MPW) loss. POTNet can effectively model tabular data containing both categorical and continuous features. Moreover, it offers the flexibility to condition on a subset of features. We provide theoretical justifications for the motivation behind the MPW loss. We also empirically demonstrate the effectiveness of our proposed method on four different benchmarks across a variety of real-world and simulated datasets. Our proposed model achieves orders of magnitude speedup during the sampling stage compared to state-of-the-art generative models for tabular data, thereby enabling efficient large-scale synthetic data generation.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity
Authors:
Boao Kong,
Shuchen Zhu,
Songtao Lu,
Xinmeng Huang,
Kun Yuan
Abstract:
Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. Howev…
▽ More
Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, current decentralized SBO algorithms face challenges, including expensive inner-loop updates and unclear understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. In this paper, we introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient iteration complexity, which, for the first time, clarifies the joint influence of network topology and data heterogeneity on decentralized bilevel algorithms. D-SOBA achieves the state-of-the-art asymptotic rate, asymptotic gradient/Hessian complexity, and transient iteration complexity under more relaxed assumptions compared to existing methods. Numerical experiments validate our theoretical findings.
△ Less
Submitted 26 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization
Authors:
A F M Saif,
Xiaodong Cui,
Han Shen,
Songtao Lu,
Brian Kingsbury,
Tianyi Chen
Abstract:
In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel op…
▽ More
In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel optimization to solve this challenging ASR problem with affordable complexity and rigorous convergence guarantees.} To evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2 datasets have been conducted. BL-JUST achieves superior performance over the commonly used pre-training followed by fine-tuning strategy.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Principal Stratification with Continuous Post-Treatment Variables: Nonparametric Identification and Semiparametric Estimation
Authors:
Sizhu Lu,
Zhichao Jiang,
Peng Ding
Abstract:
Post-treatment variables often complicate causal inference. They appear in many scientific problems, including noncompliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatme…
▽ More
Post-treatment variables often complicate causal inference. They appear in many scientific problems, including noncompliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatment effect heterogeneity across principal strata and unveiling the mechanism of the treatment's impact on the outcome related to post-treatment variables. However, the existing literature has primarily focused on binary post-treatment variables, leaving the case with continuous post-treatment variables largely unexplored. This gap persists due to the complexity of infinitely many principal strata, which present challenges to both the identification and estimation of causal effects. We fill this gap by providing nonparametric identification and semiparametric estimation theory for principal stratification with continuous post-treatment variables. We propose to use working models to approximate the underlying causal effect surfaces and derive the efficient influence functions of the corresponding model parameters. Based on the theory, we construct doubly robust estimators and implement them in an R package.
△ Less
Submitted 3 April, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Identifiability and estimation of the competing risks model under exclusion restrictions
Authors:
Munir Hiabu,
Simon M. S. LU,
Ralf A. Wilke
Abstract:
The non-identifiability of the competing risks model requires researchers to work with restrictions on the model to obtain informative results. We present a new identifiability solution based on an exclusion restriction. Many areas of applied research use methods that rely on exclusion restrcitions. It appears natural to also use them for the identifiability of competing risks models. By imposing…
▽ More
The non-identifiability of the competing risks model requires researchers to work with restrictions on the model to obtain informative results. We present a new identifiability solution based on an exclusion restriction. Many areas of applied research use methods that rely on exclusion restrcitions. It appears natural to also use them for the identifiability of competing risks models. By imposing the exclusion restriction couple with an Archimedean copula, we are able to avoid any parametric restriction on the marginal distributions. We introduce a semiparametric estimation approach for the nonparametric marginals and the parametric copula. Our simulation results demonstrate the usefulness of the suggested model, as the degree of risk dependence can be estimated without parametric restrictions on the marginal distributions.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Predicting Heart Disease and Reducing Survey Time Using Machine Learning Algorithms
Authors:
Salahaldeen Rababa,
Asma Yamin,
Shuxia Lu,
Ashraf Obaidat
Abstract:
Currently, many researchers and analysts are working toward medical diagnosis enhancement for various diseases. Heart disease is one of the common diseases that can be considered a significant cause of mortality worldwide. Early detection of heart disease significantly helps in reducing the risk of heart failure. Consequently, the Centers for Disease Control and Prevention (CDC) conducts a health-…
▽ More
Currently, many researchers and analysts are working toward medical diagnosis enhancement for various diseases. Heart disease is one of the common diseases that can be considered a significant cause of mortality worldwide. Early detection of heart disease significantly helps in reducing the risk of heart failure. Consequently, the Centers for Disease Control and Prevention (CDC) conducts a health-related telephone survey yearly from over 400,000 participants. However, several concerns arise regarding the reliability of the data in predicting heart disease and whether all of the survey questions are strongly related. This study aims to utilize several machine learning techniques, such as support vector machines and logistic regression, to investigate the accuracy of the CDC's heart disease survey in the United States. Furthermore, we use various feature selection methods to identify the most relevant subset of questions that can be utilized to forecast heart conditions. To reach a robust conclusion, we perform stability analysis by randomly sampling the data 300 times. The experimental results show that the survey data can be useful up to 80% in terms of predicting heart disease, which significantly improves the diagnostic process before bloodwork and tests. In addition, the amount of time spent conducting the survey can be reduced by 77% while maintaining the same level of performance.
△ Less
Submitted 30 May, 2023;
originally announced June 2023.
-
Flexible sensitivity analysis for causal inference in observational studies subject to unmeasured confounding
Authors:
Sizhu Lu,
Peng Ding
Abstract:
Causal inference with observational studies often suffers from unmeasured confounding, yielding biased estimators based on the unconfoundedness assumption. Sensitivity analysis assesses how the causal conclusions change with respect to different degrees of unmeasured confounding. Most existing sensitivity analysis methods work well for specific types of statistical estimation or testing strategies…
▽ More
Causal inference with observational studies often suffers from unmeasured confounding, yielding biased estimators based on the unconfoundedness assumption. Sensitivity analysis assesses how the causal conclusions change with respect to different degrees of unmeasured confounding. Most existing sensitivity analysis methods work well for specific types of statistical estimation or testing strategies. We propose a flexible sensitivity analysis framework that can deal with commonly used inverse probability weighting, outcome regression, and doubly robust estimators simultaneously. It is based on the well-known parametrization of the selection bias as comparisons of the observed and counterfactual outcomes conditional on observed covariates. It is attractive for practical use because it only requires simple modifications of the standard estimators. Moreover, it naturally extends to many other causal inference settings, including the causal risk ratio or odds ratio, the average causal effect on the treated units, and studies with survival outcomes. We also develop an R package saci to implement our sensitivity analysis estimators.
△ Less
Submitted 29 March, 2024; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Delayed and Indirect Impacts of Link Recommendations
Authors:
Han Zhang,
Shangen Lu,
Yixin Wang,
Mihaela Curmei
Abstract:
The impacts of link recommendations on social networks are challenging to evaluate, and so far they have been studied in limited settings. Observational studies are restricted in the kinds of causal questions they can answer and naive A/B tests often lead to biased evaluations due to unaccounted network interference. Furthermore, evaluations in simulation settings are often limited to static netwo…
▽ More
The impacts of link recommendations on social networks are challenging to evaluate, and so far they have been studied in limited settings. Observational studies are restricted in the kinds of causal questions they can answer and naive A/B tests often lead to biased evaluations due to unaccounted network interference. Furthermore, evaluations in simulation settings are often limited to static network models that do not take into account the potential feedback loops between link recommendation and organic network evolution. To this end, we study the impacts of recommendations on social networks in dynamic settings. Adopting a simulation-based approach, we consider an explicit dynamic formation model -- an extension of the celebrated Jackson-Rogers model -- and investigate how link recommendations affect network evolution over time. Empirically, we find that link recommendations have surprising delayed and indirect effects on the structural properties of networks. Specifically, we find that link recommendations can exhibit considerably different impacts in the immediate term and in the long term. For instance, we observe that friend-of-friend recommendations can have an immediate effect in decreasing degree inequality, but in the long term, they can make the degree distribution substantially more unequal. Moreover, we show that the effects of recommendations can persist in networks, in part due to their indirect impacts on natural dynamics even after recommendations are turned off. We show that, in counterfactual simulations, removing the indirect effects of link recommendations can make the network trend faster toward what it would have been under natural growth dynamics.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Direct and diffuse shading factors modelling for the most representative agrivoltaic system layouts
Authors:
Sebastian Zainali,
Silvia Ma Lu,
Bengt Stridh,
Anders Avelin,
Stefano Amaducci,
Michele Colauzzi,
Pietro Elia Campana
Abstract:
Agrivoltaic systems are becoming more popular as a critical technology for attaining several sustainable development goals such as affordable and clean energy, zero hunger, clean water and sanitation, and climate action. However, understanding the shading effects on crops is fundamental to choosing an optimal agrivoltaic system as a wrong choice could lead to severe crop reductions. In this study,…
▽ More
Agrivoltaic systems are becoming more popular as a critical technology for attaining several sustainable development goals such as affordable and clean energy, zero hunger, clean water and sanitation, and climate action. However, understanding the shading effects on crops is fundamental to choosing an optimal agrivoltaic system as a wrong choice could lead to severe crop reductions. In this study, fixed vertical, one-axis tracking, and two-axis tracking photovoltaic arrays for agrivoltaic applications are developed to analyse the shading conditions on the ground used for crop production. The models have shown remarkably similar accuracy compared to commercial software such as PVsyst and SketchUp. The developed models will help reduce the crop yield uncertainty under agrivoltaic systems by providing accurate photosynthetically active radiation distribution at the crop level. The distribution was further analysed using a light homogeneity index and calculating the yearly photosynthetically active radiation reduction. The homogeneity and photosynthetically active radiation reduction varied significantly depending on the agrivoltaic system design, from 91% to 95% and 11% to 34%, respectively. To identify the most suitable agrivoltaic system layout dependent on crop and geographical location, it is of fundamental importance to study the effect of shadings with distribution analysis.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks
Authors:
Zhuqing Liu,
Xin Zhang,
Prashant Khanduri,
Songtao Lu,
Jia Liu
Abstract:
In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized…
▽ More
In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized bilevel optimization over peer-to-peer networks with limited computation and communication capabilities, how to achieve low sample and communication complexities are two fundamental challenges that remain under-explored so far. In this paper, we make the first attempt to investigate the class of decentralized bilevel optimization problems with nonconvex and strongly-convex structure corresponding to the outer and inner subproblems, respectively. Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n ε^{-1})$ and communication complexity of $\mathcal{O}(ε^{-1})$ to solve the bilevel optimization problem, where $n$ and $ε> 0$ are the number of samples at each agent and the desired stationarity gap, respectively. ii) To relax the need for full gradient evaluations in each iteration, we propose a stochastic variance-reduced version of INTERACT (SVR-INTERACT), which improves the sample complexity to $\mathcal{O}(\sqrt{n} ε^{-1})$ while achieving the same communication complexity as the deterministic algorithm. To our knowledge, this work is the first that achieves both low sample and communication complexities for solving decentralized bilevel optimization problems over networks. Our numerical experiments also corroborate our theoretical findings.
△ Less
Submitted 5 October, 2022; v1 submitted 27 July, 2022;
originally announced July 2022.
-
Understanding Benign Overfitting in Gradient-Based Meta Learning
Authors:
Lisha Chen,
Songtao Lu,
Tianyi Chen
Abstract:
Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called "benign ove…
▽ More
Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called "benign overfitting." To understand this phenomenon, we focus on the meta learning settings with a challenging bilevel structure that we term the gradient-based meta learning, and analyze its generalization performance under an overparameterized meta linear regression model. While our analysis uses the relatively tractable linear models, our theory contributes to understanding the delicate interplay among data heterogeneity, model adaptation and benign overfitting in gradient-based meta learning tasks. We corroborate our theoretical claims through numerical simulations.
△ Less
Submitted 9 November, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Understanding Latent Correlation-Based Multiview Learning and Self-Supervision: An Identifiability Perspective
Authors:
Qi Lyu,
Xiao Fu,
Weiran Wang,
Songtao Lu
Abstract:
Multiple views of data, both naturally acquired (e.g., image and audio) and artificially produced (e.g., via adding different noise to data samples), have proven useful in enhancing representation learning. Natural views are often handled by multiview analysis tools, e.g., (deep) canonical correlation analysis [(D)CCA], while the artificial ones are frequently used in self-supervised learning (SSL…
▽ More
Multiple views of data, both naturally acquired (e.g., image and audio) and artificially produced (e.g., via adding different noise to data samples), have proven useful in enhancing representation learning. Natural views are often handled by multiview analysis tools, e.g., (deep) canonical correlation analysis [(D)CCA], while the artificial ones are frequently used in self-supervised learning (SSL) paradigms, e.g., BYOL and Barlow Twins. Both types of approaches often involve learning neural feature extractors such that the embeddings of data exhibit high cross-view correlations. Although intuitive, the effectiveness of correlation-based neural embedding is mostly empirically validated.
This work aims to understand latent correlation maximization-based deep multiview learning from a latent component identification viewpoint. An intuitive generative model of multiview data is adopted, where the views are different nonlinear mixtures of shared and private components. Since the shared components are view/distortion-invariant, representing the data using such components is believed to reveal the identity of the samples effectively and robustly. Under this model, latent correlation maximization is shown to guarantee the extraction of the shared components across views (up to certain ambiguities). In addition, it is further shown that the private information in each view can be provably disentangled from the shared using proper regularization design. A finite sample analysis, which has been rare in nonlinear mixture identifiability study, is also presented. The theoretical results and newly designed regularization are tested on a series of tasks.
△ Less
Submitted 8 April, 2022; v1 submitted 13 June, 2021;
originally announced June 2021.
-
Understanding Heart-Failure Patients EHR Clinical Features via SHAP Interpretation of Tree-Based Machine Learning Model Predictions
Authors:
Shuyu Lu,
Ruoyu Chen,
Wei Wei,
Xinghua Lu
Abstract:
Heart failure (HF) is a major cause of mortality. Accurately monitoring HF progress and adjust therapies are critical for improving patient outcomes. An experienced cardiologist can make accurate HF stage diagnoses based on combination of symptoms, signs, and lab results from the electronic health records (EHR) of a patient, without directly measuring heart function. We examined whether machine le…
▽ More
Heart failure (HF) is a major cause of mortality. Accurately monitoring HF progress and adjust therapies are critical for improving patient outcomes. An experienced cardiologist can make accurate HF stage diagnoses based on combination of symptoms, signs, and lab results from the electronic health records (EHR) of a patient, without directly measuring heart function. We examined whether machine learning models, more specifically the XGBoost model, can accurately predict patient stage based on EHR, and we further applied the SHapley Additive exPlanations (SHAP) framework to identify informative features and their interpretations. Our results indicate that based on structured data from EHR, our models could predict patients' ejection fraction (EF) scores with moderate accuracy. SHAP analyses identified informative features and revealed potential clinical subtypes of HF. Our findings provide insights on how to design computing systems to accurately monitor disease progression of HF patients through continuously mining patients' EHR data.
△ Less
Submitted 20 March, 2021;
originally announced March 2021.
-
Tensor networks and efficient descriptions of classical data
Authors:
Sirui Lu,
Márton Kanász-Nagy,
Ivan Kukuljan,
J. Ignacio Cirac
Abstract:
We investigate the potential of tensor network based machine learning methods to scale to large image and text data sets. For that, we study how the mutual information between a subregion and its complement scales with the subsystem size $L$, similarly to how it is done in quantum many-body physics. We find that for text, the mutual information scales as a power law $L^ν$ with a close to volume la…
▽ More
We investigate the potential of tensor network based machine learning methods to scale to large image and text data sets. For that, we study how the mutual information between a subregion and its complement scales with the subsystem size $L$, similarly to how it is done in quantum many-body physics. We find that for text, the mutual information scales as a power law $L^ν$ with a close to volume law exponent, indicating that text cannot be efficiently described by 1D tensor networks. For images, the scaling is close to an area law, hinting at 2D tensor networks such as PEPS could have an adequate expressibility. For the numerical analysis, we introduce a mutual information estimator based on autoregressive networks, and we also use convolutional neural networks in a neural estimator method.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Revisiting Smoothed Online Learning
Authors:
Lijun Zhang,
Wei Jiang,
Shiyin Lu,
Tianbao Yang
Abstract:
In this paper, we revisit the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost, and target two performance metrics: competitive ratio and dynamic regret with switching cost.
To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the simple idea of balancing the two costs by…
▽ More
In this paper, we revisit the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost, and target two performance metrics: competitive ratio and dynamic regret with switching cost.
To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the simple idea of balancing the two costs by an optimization problem. Surprisingly, we find that minimizing the hitting cost alone is $\max(1, \frac{2}α)$-competitive for $α$-polyhedral functions and $1 + \frac{4}λ$-competitive for $λ$-quadratic growth functions, both of which improve state-of-the-art results significantly. Moreover, when the hitting cost is both convex and $λ$-quadratic growth, we reduce the competitive ratio to $1 + \frac{2}{\sqrtλ}$ by minimizing the weighted sum of the hitting cost and the switching cost.
To bound the dynamic regret with switching cost, we follow the standard setting of online convex optimization, in which the hitting cost is convex but hidden from the learner before making predictions. We modify Ader, an existing algorithm designed for dynamic regret, slightly to take into account the switching cost when measuring the performance. The proposed algorithm, named as Smoothed Ader, attains an optimal $O(\sqrt{T(1+P_T)})$ bound for dynamic regret with switching cost, where $P_T$ is the path-length of the comparator sequence. Furthermore, if the hitting cost is accessible in the beginning of each round, we obtain a similar guarantee without the bounded gradient condition, and establish an $Ω(\sqrt{T(1+P_T)})$ lower bound to confirm the optimality.
△ Less
Submitted 18 May, 2021; v1 submitted 13 February, 2021;
originally announced February 2021.
-
A self-adaptive and robust fission clustering algorithm via heat diffusion and maximal turning angle
Authors:
Yu Han,
Shizhan Lu,
Haiyan Xu
Abstract:
Cluster analysis, which focuses on the grouping and categorization of similar elements, is widely used in various fields of research. A novel and fast clustering algorithm, fission clustering algorithm, is proposed in recent year. In this article, we propose a robust fission clustering (RFC) algorithm and a self-adaptive noise identification method. The RFC and the self-adaptive noise identificati…
▽ More
Cluster analysis, which focuses on the grouping and categorization of similar elements, is widely used in various fields of research. A novel and fast clustering algorithm, fission clustering algorithm, is proposed in recent year. In this article, we propose a robust fission clustering (RFC) algorithm and a self-adaptive noise identification method. The RFC and the self-adaptive noise identification method are combine to propose a self-adaptive robust fission clustering (SARFC) algorithm. Several frequently-used datasets were applied to test the performance of the proposed clustering approach and to compare the results with those of other algorithms. The comprehensive comparisons indicate that the proposed method has advantages over other common methods.
△ Less
Submitted 7 February, 2021;
originally announced February 2021.
-
Overcoming Catastrophic Forgetting via Direction-Constrained Optimization
Authors:
Yunfei Teng,
Anna Choromanska,
Murray Campbell,
Songtao Lu,
Parikshit Ram,
Lior Horesh
Abstract:
This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in networ…
▽ More
This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.
△ Less
Submitted 1 July, 2022; v1 submitted 25 November, 2020;
originally announced November 2020.
-
Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations
Authors:
Pu Zhao,
Parikshit Ram,
Songtao Lu,
Yuguang Yao,
Djallel Bouneffouf,
Xue Lin,
Sijia Liu
Abstract:
Adversarial perturbations are critical for certifying the robustness of deep learning models. A universal adversarial perturbation (UAP) can simultaneously attack multiple images, and thus offers a more unified threat model, obviating an image-wise attack algorithm. However, the existing UAP generator is underdeveloped when images are drawn from different image sources (e.g., with different image…
▽ More
Adversarial perturbations are critical for certifying the robustness of deep learning models. A universal adversarial perturbation (UAP) can simultaneously attack multiple images, and thus offers a more unified threat model, obviating an image-wise attack algorithm. However, the existing UAP generator is underdeveloped when images are drawn from different image sources (e.g., with different image resolutions). Towards an authentic universality across image sources, we take a novel view of UAP generation as a customized instance of few-shot learning, which leverages bilevel optimization and learning-to-optimize (L2O) techniques for UAP generation with improved attack success rate (ASR). We begin by considering the popular model agnostic meta-learning (MAML) framework to meta-learn a UAP generator. However, we see that the MAML framework does not directly offer the universal attack across image sources, requiring us to integrate it with another meta-learning framework of L2O. The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and image data sources.
△ Less
Submitted 17 August, 2022; v1 submitted 28 September, 2020;
originally announced September 2020.
-
DARTS-: Robustly Stepping out of Performance Collapse Without Indicators
Authors:
Xiangxiang Chu,
Xiaoxing Wang,
Bo Zhang,
Shun Lu,
Xiaolin Wei,
Junchi Yan
Abstract:
Despite the fast development of differentiable architecture search (DARTS), it suffers from long-standing performance instability, which extremely limits its application. Existing robustifying methods draw clues from the resulting deteriorated behavior instead of finding out its causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal to stop searching before the per…
▽ More
Despite the fast development of differentiable architecture search (DARTS), it suffers from long-standing performance instability, which extremely limits its application. Existing robustifying methods draw clues from the resulting deteriorated behavior instead of finding out its causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal to stop searching before the performance collapses. However, these indicator-based methods tend to easily reject good architectures if the thresholds are inappropriately set, let alone the searching is intrinsically noisy. In this paper, we undertake a more subtle and direct approach to resolve the collapse. We first demonstrate that skip connections have a clear advantage over other candidate operations, where it can easily recover from a disadvantageous state and become dominant. We conjecture that this privilege is causing degenerated performance. Therefore, we propose to factor out this benefit with an auxiliary skip connection, ensuring a fairer competition for all operations. We call this approach DARTS-. Extensive experiments on various datasets verify that it can substantially improve robustness. Our code is available at https://github.com/Meituan-AutoML/DARTS- .
△ Less
Submitted 15 January, 2021; v1 submitted 2 September, 2020;
originally announced September 2020.
-
Orthogonalized SGD and Nested Architectures for Anytime Neural Networks
Authors:
Chengcheng Wan,
Henry Hoffmann,
Shan Lu,
Michael Maire
Abstract:
We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time. Efficient architectural designs for these networks focus on re-using internal state; subnetworks must produce representations relevant for both immediate prediction as well as refinement by subsequent network stage…
▽ More
We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time. Efficient architectural designs for these networks focus on re-using internal state; subnetworks must produce representations relevant for both immediate prediction as well as refinement by subsequent network stages. We consider traditional branched networks as well as a new class of recursively nested networks. Our new optimizer, Orthogonalized SGD, dynamically re-balances task-specific gradients when training a multitask network. In the context of anytime architectures, this optimizer projects gradients from later outputs onto a parameter subspace that does not interfere with those from earlier outputs. Experiments demonstrate that training with Orthogonalized SGD significantly improves generalization accuracy of anytime networks.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.
-
Exponentially Weighted l_2 Regularization Strategy in Constructing Reinforced Second-order Fuzzy Rule-based Model
Authors:
Congcong Zhang,
Sung-Kwun Oh,
Witold Pedrycz,
Zunwei Fu,
Shanzhen Lu
Abstract:
In the conventional Takagi-Sugeno-Kang (TSK)-type fuzzy models, constant or linear functions are usually utilized as the consequent parts of the fuzzy rules, but they cannot effectively describe the behavior within local regions defined by the antecedent parts. In this article, a theoretical and practical design methodology is developed to address this problem. First, the information granulation (…
▽ More
In the conventional Takagi-Sugeno-Kang (TSK)-type fuzzy models, constant or linear functions are usually utilized as the consequent parts of the fuzzy rules, but they cannot effectively describe the behavior within local regions defined by the antecedent parts. In this article, a theoretical and practical design methodology is developed to address this problem. First, the information granulation (Fuzzy C-Means) method is applied to capture the structure in the data and split the input space into subspaces, as well as form the antecedent parts. Second, the quadratic polynomials (QPs) are employed as the consequent parts. Compared with constant and linear functions, QPs can describe the input-output behavior within the local regions (subspaces) by refining the relationship between input and output variables. However, although QP can improve the approximation ability of the model, it could lead to the deterioration of the prediction ability of the model (e.g., overfitting). To handle this issue, we introduce an exponential weight approach inspired by the weight function theory encountered in harmonic analysis. More specifically, we adopt the exponential functions as the targeted penalty terms, which are equipped with l2 regularization (l2) (i.e., exponential weighted l2, ewl_2) to match the proposed reinforced second-order fuzzy rule-based model (RSFRM) properly. The advantage of el 2 compared to ordinary l2 lies in separately identifying and penalizing different types of polynomial terms in the coefficient estimation, and its results not only alleviate the overfitting and prevent the deterioration of generalization ability but also effectively release the prediction potential of the model.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
An Early Warning Approach to Monitor COVID-19 Activity with Multiple Digital Traces in Near Real-Time
Authors:
Nicole E. Kogan,
Leonardo Clemente,
Parker Liautaud,
Justin Kaashoek,
Nicholas B. Link,
Andre T. Nguyen,
Fred S. Lu,
Peter Huybers,
Bernd Resch,
Clemens Havas,
Andreas Petutschnig,
Jessica Davis,
Matteo Chinazzi,
Backtosch Mustafa,
William P. Hanage,
Alessandro Vespignani,
Mauricio Santillana
Abstract:
Non-pharmaceutical interventions (NPIs) have been crucial in curbing COVID-19 in the United States (US). Consequently, relaxing NPIs through a phased re-opening of the US amid still-high levels of COVID-19 susceptibility could lead to new epidemic waves. This calls for a COVID-19 early warning system. Here we evaluate multiple digital data streams as early warning indicators of increasing or decre…
▽ More
Non-pharmaceutical interventions (NPIs) have been crucial in curbing COVID-19 in the United States (US). Consequently, relaxing NPIs through a phased re-opening of the US amid still-high levels of COVID-19 susceptibility could lead to new epidemic waves. This calls for a COVID-19 early warning system. Here we evaluate multiple digital data streams as early warning indicators of increasing or decreasing state-level US COVID-19 activity between January and June 2020. We estimate the timing of sharp changes in each data stream using a simple Bayesian model that calculates in near real-time the probability of exponential growth or decay. Analysis of COVID-19-related activity on social network microblogs, Internet searches, point-of-care medical software, and a metapopulation mechanistic model, as well as fever anomalies captured by smart thermometer networks, shows exponential growth roughly 2-3 weeks prior to comparable growth in confirmed COVID-19 cases and 3-4 weeks prior to comparable growth in COVID-19 deaths across the US over the last 6 months. We further observe exponential decay in confirmed cases and deaths 5-6 weeks after implementation of NPIs, as measured by anonymized and aggregated human mobility data from mobile phones. Finally, we propose a combined indicator for exponential growth in multiple data streams that may aid in developing an early warning system for future COVID-19 outbreaks. These efforts represent an initial exploratory framework, and both continued study of the predictive power of digital indicators as well as further development of the statistical approach are needed.
△ Less
Submitted 3 July, 2020; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Regularly Updated Deterministic Policy Gradient Algorithm
Authors:
Shuai Han,
Wenbo Zhou,
Shuai Lü,
Jiayu Yu
Abstract:
Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method is inefficient and unstable in practical applications. On the other hand, the bias and variance of the Q estimation in the target function are sometimes difficult to control. This paper proposes a Regularly Updated Deterministic (RUD) policy gradient algorithm for…
▽ More
Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method is inefficient and unstable in practical applications. On the other hand, the bias and variance of the Q estimation in the target function are sometimes difficult to control. This paper proposes a Regularly Updated Deterministic (RUD) policy gradient algorithm for these problems. This paper theoretically proves that the learning procedure with RUD can make better use of new data in replay buffer than the traditional procedure. In addition, the low variance of the Q value in RUD is more suitable for the current Clipped Double Q-learning strategy. This paper has designed a comparison experiment against previous methods, an ablation experiment with the original DDPG, and other analytical experiments in Mujoco environments. The experimental results demonstrate the effectiveness and superiority of RUD.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
NROWAN-DQN: A Stable Noisy Network with Noise Reduction and Online Weight Adjustment for Exploration
Authors:
Shuai Han,
Wenbo Zhou,
Jing Liu,
Shuai Lü
Abstract:
Deep reinforcement learning has been applied more and more widely nowadays, especially in various complex control tasks. Effective exploration for noisy networks is one of the most important issues in deep reinforcement learning. Noisy networks tend to produce stable outputs for agents. However, this tendency is not always enough to find a stable policy for an agent, which decreases efficiency and…
▽ More
Deep reinforcement learning has been applied more and more widely nowadays, especially in various complex control tasks. Effective exploration for noisy networks is one of the most important issues in deep reinforcement learning. Noisy networks tend to produce stable outputs for agents. However, this tendency is not always enough to find a stable policy for an agent, which decreases efficiency and stability during the learning process. Based on NoisyNets, this paper proposes an algorithm called NROWAN-DQN, i.e., Noise Reduction and Online Weight Adjustment NoisyNet-DQN. Firstly, we develop a novel noise reduction method for NoisyNet-DQN to make the agent perform stable actions. Secondly, we design an online weight adjustment strategy for noise reduction, which improves stable performance and gets higher scores for the agent. Finally, we evaluate this algorithm in four standard domains and analyze properties of hyper-parameters. Our results show that NROWAN-DQN outperforms prior algorithms in all these domains. In addition, NROWAN-DQN also shows better stability. The variance of the NROWAN-DQN score is significantly reduced, especially in some action-sensitive environments. This means that in some environments where high stability is required, NROWAN-DQN will be more appropriate than NoisyNets-DQN.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
Non-convex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances
Authors:
Meisam Razaviyayn,
Tianjian Huang,
Songtao Lu,
Maher Nouiehed,
Maziar Sanjabi,
Mingyi Hong
Abstract:
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very pop…
▽ More
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very popular in a wide range of signal and data processing applications such as fair beamforming, training generative adversarial networks (GANs), and robust machine learning, to just name a few. The overarching goal of this article is to provide a survey of recent advances for an important subclass of min-max problem, where the minimization and maximization problems can be non-convex and/or non-concave. In particular, we will first present a number of applications to showcase the importance of such min-max problems; then we discuss key theoretical challenges, and provide a selective review of some exciting recent theoretical and algorithmic advances in tackling non-convex min-max problems. Finally, we will point out open questions and future research directions.
△ Less
Submitted 18 August, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Exploring the Connection Between Binary and Spiking Neural Networks
Authors:
Sen Lu,
Abhronil Sengupta
Abstract:
On-chip edge intelligence has necessitated the exploration of algorithmic techniques to reduce the compute requirements of current machine learning frameworks. This work aims to bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks - both of which are driven by the same motivation and yet synergies between the two have not been fully explored. We sho…
▽ More
On-chip edge intelligence has necessitated the exploration of algorithmic techniques to reduce the compute requirements of current machine learning frameworks. This work aims to bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks - both of which are driven by the same motivation and yet synergies between the two have not been fully explored. We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on large-scale datasets like CIFAR-$100$ and ImageNet. An important implication of this work is that Binary Spiking Neural Networks can be enabled by "In-Memory" hardware accelerators catered for Binary Neural Networks without suffering any accuracy degradation due to binarization. We utilize standard training techniques for non-spiking networks to generate our spiking networks by conversion process and also perform an extensive empirical analysis and explore simple design-time and run-time optimization techniques for reducing inference latency of spiking networks (both for binary and full-precision models) by an order of magnitude over prior work.
△ Less
Submitted 21 May, 2020; v1 submitted 23 February, 2020;
originally announced February 2020.
-
Sequential Monitoring of Changes in Housing Prices
Authors:
Lajos Horváth,
Zhenya Liu,
Shanglin Lu
Abstract:
We propose a sequential monitoring scheme to find structural breaks in real estate markets. The changes in the real estate prices are modeled by a combination of linear and autoregressive terms. The monitoring scheme is based on a detector and a suitably chosen boundary function. If the detector crosses the boundary function, a structural break is detected. We provide the asymptotics for the proce…
▽ More
We propose a sequential monitoring scheme to find structural breaks in real estate markets. The changes in the real estate prices are modeled by a combination of linear and autoregressive terms. The monitoring scheme is based on a detector and a suitably chosen boundary function. If the detector crosses the boundary function, a structural break is detected. We provide the asymptotics for the procedure under the stability null hypothesis and the stopping time under the change point alternative. Monte Carlo simulation is used to show the size and the power of our method under several conditions. We study the real estate markets in Boston, Los Angeles and at the national U.S. level. We find structural breaks in the markets, and we segment the data into stationary segments. It is observed that the autoregressive parameter is increasing but stays below 1.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Minimizing Dynamic Regret and Adaptive Regret Simultaneously
Authors:
Lijun Zhang,
Shiyin Lu,
Tianbao Yang
Abstract:
Regret minimization is treated as the golden rule in the traditional study of online learning. However, regret minimization algorithms tend to converge to the static optimum, thus being suboptimal for changing environments. To address this limitation, new performance measures, including dynamic regret and adaptive regret have been proposed to guide the design of online algorithms. The former one a…
▽ More
Regret minimization is treated as the golden rule in the traditional study of online learning. However, regret minimization algorithms tend to converge to the static optimum, thus being suboptimal for changing environments. To address this limitation, new performance measures, including dynamic regret and adaptive regret have been proposed to guide the design of online algorithms. The former one aims to minimize the global regret with respect to a sequence of changing comparators, and the latter one attempts to minimize every local regret with respect to a fixed comparator. Existing algorithms for dynamic regret and adaptive regret are developed independently, and only target one performance measure. In this paper, we bridge this gap by proposing novel online algorithms that are able to minimize the dynamic regret and adaptive regret simultaneously. In fact, our theoretical guarantee is even stronger in the sense that one algorithm is able to minimize the dynamic regret over any interval.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
MixPath: A Unified Approach for One-shot Neural Architecture Search
Authors:
Xiangxiang Chu,
Shun Lu,
Xudong Li,
Bo Zhang
Abstract:
Blending multiple convolutional kernels is proved advantageous in neural architecture design. However, current two-stage neural architecture search methods are mainly limited to single-path search spaces. How to efficiently search models of multi-path structures remains a difficult problem. In this paper, we are motivated to train a one-shot multi-path supernet to accurately evaluate the candidate…
▽ More
Blending multiple convolutional kernels is proved advantageous in neural architecture design. However, current two-stage neural architecture search methods are mainly limited to single-path search spaces. How to efficiently search models of multi-path structures remains a difficult problem. In this paper, we are motivated to train a one-shot multi-path supernet to accurately evaluate the candidate architectures. Specifically, we discover that in the studied search spaces, feature vectors summed from multiple paths are nearly multiples of those from a single path. Such disparity perturbs the supernet training and its ranking ability. Therefore, we propose a novel mechanism called Shadow Batch Normalization (SBN) to regularize the disparate feature statistics. Extensive experiments prove that SBNs are capable of stabilizing the optimization and improving ranking performance. We call our unified multi-path one-shot approach as MixPath, which generates a series of models that achieve state-of-the-art results on ImageNet.
△ Less
Submitted 19 July, 2023; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond
Authors:
Tsung-Hui Chang,
Mingyi Hong,
Hoi-To Wai,
Xinwei Zhang,
Songtao Lu
Abstract:
Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In par…
▽ More
Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i.e., problem classes), processing batch and streaming data (i.e., data types), over the networks in a distributed manner (i.e., communication and computation paradigm). We describe the intuitions and connections behind a core set of popular distributed algorithms, emphasizing how to trade off between computation and communication costs. Practical issues and future research directions will also be discussed.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Self-adaption grey DBSCAN clustering
Authors:
Shizhan Lu
Abstract:
Clustering analysis, a classical issue in data mining, is widely used in various research areas. This article aims at proposing a self-adaption grey DBSCAN clustering (SAG-DBSCAN) algorithm. First, the grey relational matrix is used to obtain the grey local density indicator, and then this indicator is applied to make self-adapting noise identification for obtaining a dense subset of clustering da…
▽ More
Clustering analysis, a classical issue in data mining, is widely used in various research areas. This article aims at proposing a self-adaption grey DBSCAN clustering (SAG-DBSCAN) algorithm. First, the grey relational matrix is used to obtain the grey local density indicator, and then this indicator is applied to make self-adapting noise identification for obtaining a dense subset of clustering dataset, finally, the DBSCAN which automatically selects parameters is utilized to cluster the dense subset. Several frequently-used datasets were used to demonstrate the performance and effectiveness of the proposed clustering algorithm and to compare the results with those of other state-of-the-art algorithms. The comprehensive comparisons indicate that our method has advantages over other compared methods.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Learn Electronic Health Records by Fully Decentralized Federated Learning
Authors:
Songtao Lu,
Yawen Zhang,
Yunlong Wang,
Christina Mack
Abstract:
Federated learning opens a number of research opportunities due to its high communication efficiency in distributed training problems within a star network. In this paper, we focus on improving the communication efficiency for fully decentralized federated learning over a graph, where the algorithm performs local updates for several iterations and then enables communications among the nodes. In su…
▽ More
Federated learning opens a number of research opportunities due to its high communication efficiency in distributed training problems within a star network. In this paper, we focus on improving the communication efficiency for fully decentralized federated learning over a graph, where the algorithm performs local updates for several iterations and then enables communications among the nodes. In such a way, the communication rounds of exchanging the common interest of parameters can be saved significantly without loss of optimality of the solutions. Multiple numerical simulations based on large, real-world electronic health record databases showcase the superiority of the decentralized federated learning compared with classic methods.
△ Less
Submitted 9 December, 2019; v1 submitted 3 December, 2019;
originally announced December 2019.
-
A clustered Gaussian process model for computer experiments
Authors:
Chih-Li Sung,
Benjamin Haaland,
Youngdeok Hwang,
Siyuan Lu
Abstract:
A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process model is…
▽ More
A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process model is performed. The stochastic expectation-maximization is employed to efficiently fit the model. In our simulations as well as a real application to solar irradiance emulation, our proposed method had smaller mean square errors than its main competitors, with competitive computation time, and provides valuable insights from data by discovering the clusters. An R package for the proposed methodology is provided in an open repository.
△ Less
Submitted 5 November, 2020; v1 submitted 11 November, 2019;
originally announced November 2019.
-
Statistical Modeling for Spatio-Temporal Data from Stochastic Convection-Diffusion Processes
Authors:
Xiao Liu,
Kyongmin Yeo,
Siyuan Lu
Abstract:
This paper proposes a physical-statistical modeling approach for spatio-temporal data arising from a class of stochastic convection-diffusion processes. Such processes are widely found in scientific and engineering applications where fundamental physics imposes critical constraints on how data can be modeled and how models should be interpreted. The idea of spectrum decomposition is employed to ap…
▽ More
This paper proposes a physical-statistical modeling approach for spatio-temporal data arising from a class of stochastic convection-diffusion processes. Such processes are widely found in scientific and engineering applications where fundamental physics imposes critical constraints on how data can be modeled and how models should be interpreted. The idea of spectrum decomposition is employed to approximate a physical spatio-temporal process by the linear combination of spatial basis functions and a multivariate random process of spectral coefficients. Unlike existing approaches assuming spatially- and temporally-invariant convection-diffusion, this paper considers a more general scenario with spatially-varying convection-diffusion and nonzero-mean source-sink. As a result, the temporal dynamics of spectral coefficients is coupled with each other, which can be interpreted as the non-linear energy redistribution across multiple scales from the perspective of physics. Because of the spatially-varying convection-diffusion, the space-time covariance is non-stationary in space. The theoretical results are integrated into a hierarchical dynamical spatio-temporal model. The connection is established between the proposed model and the existing models based on Integro-Difference Equations. Computational efficiency and scalability are also investigated to make the proposed approach practical. The advantages of the proposed methodology are demonstrated by numerical examples, a case study, and comprehensive comparison studies. Computer code is available on GitHub.
△ Less
Submitted 5 August, 2020; v1 submitted 23 October, 2019;
originally announced October 2019.
-
No-regret Non-convex Online Meta-Learning
Authors:
Zhenxun Zhuang,
Yunlong Wang,
Kezi Yu,
Songtao Lu
Abstract:
The online meta-learning framework is designed for the continual lifelong learning setting. It bridges two fields: meta-learning which tries to extract prior knowledge from past tasks for fast learning of future tasks, and online-learning which deals with the sequential setting where problems are revealed one by one. In this paper, we generalize the original framework from convex to non-convex set…
▽ More
The online meta-learning framework is designed for the continual lifelong learning setting. It bridges two fields: meta-learning which tries to extract prior knowledge from past tasks for fast learning of future tasks, and online-learning which deals with the sequential setting where problems are revealed one by one. In this paper, we generalize the original framework from convex to non-convex setting, and introduce the local regret as the alternative performance measure. We then apply this framework to stochastic settings, and show theoretically that it enjoys a logarithmic local regret, and is robust to any hyperparameter initialization. The empirical test on a real-world task demonstrates its superiority compared with traditional methods.
△ Less
Submitted 18 February, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach
Authors:
Haoran Sun,
Songtao Lu,
Mingyi Hong
Abstract:
Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks.
In this work, we propose a decentra…
▽ More
Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks.
In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are $m$ nodes in the system, and each node has a large number of samples (denoted as $n$). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to both reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain $ε$ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an $\mathcal{O}(mn^{1/2}ε^{-1})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity. These bounds significantly improve upon the best existing bounds of $\mathcal{O}(mnε^{-1})$ and $\mathcal{O}(ε^{-1})$, respectively. Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m ε^{-3/2})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(mε^{-2})$ and $\mathcal{O}(ε^{-2})$, respectively.
△ Less
Submitted 13 October, 2019;
originally announced October 2019.
-
Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML
Authors:
Sijia Liu,
Songtao Lu,
Xiangyi Chen,
Yao Feng,
Kaidi Xu,
Abdullah Al-Dujaili,
Minyi Hong,
Una-May O'Reilly
Abstract:
In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former…
▽ More
In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the later needs just one-step descent/ascent update. We show that the proposed framework, referred to as ZO-Min-Max, has a sub-linear convergence rate under mild conditions and scales gracefully with problem size. From an application side, we explore a promising connection between black-box min-max optimization and black-box evasion and poisoning attacks in adversarial machine learning (ML). Our empirical evaluations on these use cases demonstrate the effectiveness of our approach and its scalability to dimensions that prohibit using recent black-box solvers.
△ Less
Submitted 16 June, 2020; v1 submitted 30 September, 2019;
originally announced September 2019.
-
Adaptive and Efficient Algorithms for Tracking the Best Expert
Authors:
Shiyin Lu,
Lijun Zhang
Abstract:
In this paper, we consider the problem of prediction with expert advice in dynamic environments. We choose tracking regret as the performance metric and develop two adaptive and efficient algorithms with data-dependent tracking regret bounds. The first algorithm achieves a second-order tracking regret bound, which improves existing first-order bounds. The second algorithm enjoys a path-length boun…
▽ More
In this paper, we consider the problem of prediction with expert advice in dynamic environments. We choose tracking regret as the performance metric and develop two adaptive and efficient algorithms with data-dependent tracking regret bounds. The first algorithm achieves a second-order tracking regret bound, which improves existing first-order bounds. The second algorithm enjoys a path-length bound, which is generally not comparable to the second-order bound but offers advantages in slowly moving environments. Both algorithms are developed under the online mirror descent framework and draw inspiration from existing algorithms that attain data-dependent bounds of static regret. The key idea is to use a clipped simplex in the updating step of online mirror descent. Finally, we extend our algorithms and analysis to online matrix prediction and provide the first data-dependent tracking regret bound for this problem.
△ Less
Submitted 8 February, 2020; v1 submitted 4 September, 2019;
originally announced September 2019.
-
SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems
Authors:
Songtao Lu,
Meisam Razaviyayn,
Bo Yang,
Kejun Huang,
Mingyi Hong
Abstract:
This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementa…
▽ More
This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions (with probability one). The SC condition is then used to establish an equivalence relationship between two different notions of SOSPs, one of which is computationally easy to verify. Based on this particular notion of SOSP, we design an algorithm named the Successive Negative-curvature grAdient Projection (SNAP), which successively performs either conventional gradient projection or some negative curvature based projection steps to find SOSPs. SNAP and its first-order extension SNAP$^+$, require $\mathcal{O}(1/ε^{2.5})$ iterations to compute an $(ε, \sqrtε)$-SOSP, and their per-iteration computational complexities are polynomial in the number of constraints and problem dimension. To our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate have been designed to find SOSPs of the important class of non-convex problems with linear constraints.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Clustering by the way of atomic fission
Authors:
Shizhan Lu
Abstract:
Cluster analysis which focuses on the grouping and categorization of similar elements is widely used in various fields of research. Inspired by the phenomenon of atomic fission, a novel density-based clustering algorithm is proposed in this paper, called fission clustering (FC). It focuses on mining the dense families of a dataset and utilizes the information of the distance matrix to fissure clus…
▽ More
Cluster analysis which focuses on the grouping and categorization of similar elements is widely used in various fields of research. Inspired by the phenomenon of atomic fission, a novel density-based clustering algorithm is proposed in this paper, called fission clustering (FC). It focuses on mining the dense families of a dataset and utilizes the information of the distance matrix to fissure clustering dataset into subsets. When we face the dataset which has a few points surround the dense families of clusters, K-nearest neighbors local density indicator is applied to distinguish and remove the points of sparse areas so as to obtain a dense subset that is constituted by the dense families of clusters. A number of frequently-used datasets were used to test the performance of this clustering approach, and to compare the results with those of algorithms. The proposed algorithm is found to outperform other algorithms in speed and accuracy.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.
-
Neurally-Guided Structure Inference
Authors:
Sidi Lu,
Jiayuan Mao,
Joshua B. Tenenbaum,
Jiajun Wu
Abstract:
Most structure inference methods either rely on exhaustive search or are purely data-driven. Exhaustive search robustly infers the structure of arbitrarily complex data, but it is slow. Data-driven methods allow efficient inference, but do not generalize when test data have more complex structures than training data. In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Struc…
▽ More
Most structure inference methods either rely on exhaustive search or are purely data-driven. Exhaustive search robustly infers the structure of arbitrarily complex data, but it is slow. Data-driven methods allow efficient inference, but do not generalize when test data have more complex structures than training data. In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Structure Inference (NG-SI), keeping the advantages of both search-based and data-driven methods. The key idea of NG-SI is to use a neural network to guide the hierarchical, layer-wise search over the compositional space of structures. We evaluate our algorithm on two representative structure inference tasks: probabilistic matrix decomposition and symbolic program parsing. It outperforms data-driven and search-based alternatives on both tasks.
△ Less
Submitted 15 August, 2019; v1 submitted 17 June, 2019;
originally announced June 2019.
-
Multi-Objective Generalized Linear Bandits
Authors:
Shiyin Lu,
Guanghui Wang,
Yao Hu,
Lijun Zhang
Abstract:
In this paper, we study the multi-objective bandits (MOB) problem, where a learner repeatedly selects one arm to play and then receives a reward vector consisting of multiple objectives. MOB has found many real-world applications as varied as online recommendation and network routing. On the other hand, these applications typically contain contextual information that can guide the learning process…
▽ More
In this paper, we study the multi-objective bandits (MOB) problem, where a learner repeatedly selects one arm to play and then receives a reward vector consisting of multiple objectives. MOB has found many real-world applications as varied as online recommendation and network routing. On the other hand, these applications typically contain contextual information that can guide the learning process which, however, is ignored by most of existing work. To utilize this information, we associate each arm with a context vector and assume the reward follows the generalized linear model (GLM). We adopt the notion of Pareto regret to evaluate the learner's performance and develop a novel algorithm for minimizing it. The essential idea is to apply a variant of the online Newton step to estimate model parameters, based on which we utilize the upper confidence bound (UCB) policy to construct an approximation of the Pareto front, and then uniformly at random choose one arm from the approximate Pareto front. Theoretical analysis shows that the proposed algorithm achieves an $\tilde O(d\sqrt{T})$ Pareto regret, where $T$ is the time horizon and $d$ is the dimension of contexts, which matches the optimal result for single objective contextual bandits problem. Numerical experiments demonstrate the effectiveness of our method.
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Noisy and Incomplete Boolean Matrix Factorizationvia Expectation Maximization
Authors:
Lifan Liang,
Songjian Lu
Abstract:
Probabilistic approach to Boolean matrix factorization can provide solutions robustagainst noise and missing values with linear computational complexity. However,the assumption about latent factors can be problematic in real world applications.This study proposed a new probabilistic algorithm free of assumptions of latentfactors, while retaining the advantages of previous algorithms. Real data exp…
▽ More
Probabilistic approach to Boolean matrix factorization can provide solutions robustagainst noise and missing values with linear computational complexity. However,the assumption about latent factors can be problematic in real world applications.This study proposed a new probabilistic algorithm free of assumptions of latentfactors, while retaining the advantages of previous algorithms. Real data experimentshowed that our algorithm was favourably compared with current state-of-the-artprobabilistic algorithms.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
Adaptivity and Optimality: A Universal Algorithm for Online Convex Optimization
Authors:
Guanghui Wang,
Shiyin Lu,
Lijun Zhang
Abstract:
In this paper, we study adaptive online convex optimization, and aim to design a universal algorithm that achieves optimal regret bounds for multiple common types of loss functions. Existing universal methods are limited in the sense that they are optimal for only a subclass of loss functions. To address this limitation, we propose a novel online method, namely Maler, which enjoys the optimal…
▽ More
In this paper, we study adaptive online convex optimization, and aim to design a universal algorithm that achieves optimal regret bounds for multiple common types of loss functions. Existing universal methods are limited in the sense that they are optimal for only a subclass of loss functions. To address this limitation, we propose a novel online method, namely Maler, which enjoys the optimal $O(\sqrt{T})$, $O(d\log T)$ and $O(\log T)$ regret bounds for general convex, exponentially concave, and strongly convex functions respectively. The essential idea is to run multiple types of learning algorithms with different learning rates in parallel, and utilize a meta algorithm to track the best one on the fly. Empirical results demonstrate the effectiveness of our method.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
SAdam: A Variant of Adam for Strongly Convex Functions
Authors:
Guanghui Wang,
Shiyin Lu,
Weiwei Tu,
Lijun Zhang
Abstract:
The Adam algorithm has become extremely popular for large-scale machine learning. Under convexity condition, it has been proved to enjoy a data-dependant $O(\sqrt{T})$ regret bound where $T$ is the time horizon. However, whether strong convexity can be utilized to further improve the performance remains an open problem. In this paper, we give an affirmative answer by developing a variant of Adam (…
▽ More
The Adam algorithm has become extremely popular for large-scale machine learning. Under convexity condition, it has been proved to enjoy a data-dependant $O(\sqrt{T})$ regret bound where $T$ is the time horizon. However, whether strong convexity can be utilized to further improve the performance remains an open problem. In this paper, we give an affirmative answer by developing a variant of Adam (referred to as SAdam) which achieves a data-dependant $O(\log T)$ regret bound for strongly convex functions. The essential idea is to maintain a faster decaying yet under controlled step size for exploiting strong convexity. In addition, under a special configuration of hyperparameters, our SAdam reduces to SC-RMSprop, a recently proposed variant of RMSprop for strongly convex functions, for which we provide the first data-dependent logarithmic regret bound. Empirical results on optimizing strongly convex functions and training deep networks demonstrate the effectiveness of our method.
△ Less
Submitted 8 May, 2019;
originally announced May 2019.
-
Cluster Developing 1-Bit Matrix Completion
Authors:
Chengkun Zhang. Junbin Gao,
Stephen Lu
Abstract:
Matrix completion has a long-time history of usage as the core technique of recommender systems. In particular, 1-bit matrix completion, which considers the prediction as a ``Recommended'' or ``Not Recommended'' question, has proved its significance and validity in the field. However, while customers and products aggregate into interacted clusters, state-of-the-art model-based 1-bit recommender sy…
▽ More
Matrix completion has a long-time history of usage as the core technique of recommender systems. In particular, 1-bit matrix completion, which considers the prediction as a ``Recommended'' or ``Not Recommended'' question, has proved its significance and validity in the field. However, while customers and products aggregate into interacted clusters, state-of-the-art model-based 1-bit recommender systems do not take the consideration of grouping bias. To tackle the gap, this paper introduced Group-Specific 1-bit Matrix Completion (GS1MC) by first-time consolidating group-specific effects into 1-bit recommender systems under the low-rank latent variable framework. Additionally, to empower GS1MC even when grouping information is unobtainable, Cluster Developing Matrix Completion (CDMC) was proposed by integrating the sparse subspace clustering technique into GS1MC. Namely, CDMC allows clustering users/items and to leverage their group effects into matrix completion at the same time. Experiments on synthetic and real-world data show that GS1MC outperforms the current 1-bit matrix completion methods. Meanwhile, it is compelling that CDMC can successfully capture items' genre features only based on sparse binary user-item interactive data. Notably, GS1MC provides a new insight to incorporate and evaluate the efficacy of clustering methods while CDMC can be served as a new tool to explore unrevealed social behavior or market phenomenon.
△ Less
Submitted 7 April, 2019;
originally announced April 2019.
-
Signal Demodulation with Machine Learning Methods for Physical Layer Visible Light Communications: Prototype Platform, Open Dataset and Algorithms
Authors:
Shuai Ma,
Jiahui Dai,
Songtao Lu,
Hang Li,
Han Zhang,
Chun Du,
Shiyin Li
Abstract:
In this paper, we investigate the design and implementation of machine learning (ML) based demodulation methods in the physical layer of visible light communication (VLC) systems. We build a flexible hardware prototype of an end-to-end VLC system, from which the received signals are collected as the real data. The dataset is available online, which contains eight types of modulated signals. Then,…
▽ More
In this paper, we investigate the design and implementation of machine learning (ML) based demodulation methods in the physical layer of visible light communication (VLC) systems. We build a flexible hardware prototype of an end-to-end VLC system, from which the received signals are collected as the real data. The dataset is available online, which contains eight types of modulated signals. Then, we propose three ML demodulators based on convolutional neural network (CNN), deep belief network (DBN), and adaptive boosting (AdaBoost), respectively. Specifically, the CNN based demodulator converts the modulated signals to images and recognizes the signals by the image classification. The proposed DBN based demodulator contains three restricted Boltzmann machines (RBMs) to extract the modulation features. The AdaBoost method includes a strong classifier that is constructed by the weak classifiers with the k-nearest neighbor (KNN) algorithm. These three demodulators are trained and tested by our online open dataset. Experimental results show that the demodulation accuracy of the three data-driven demodulators drops as the transmission distance increases. A higher modulation order negatively influences the accuracy for a given transmission distance. Among the three ML methods, the AdaBoost modulator achieves the best performance.
△ Less
Submitted 13 March, 2019;
originally announced March 2019.
-
Deep Learning for Signal Demodulation in Physical Layer Wireless Communications: Prototype Platform, Open Dataset, and Analytics
Authors:
Hongmei Wang,
Zhenzhen Wu,
Shuai Ma,
Songtao Lu,
Han Zhang,
Guoru Ding,
Shiyin Li
Abstract:
In this paper, we investigate deep learning (DL)-enabled signal demodulation methods and establish the first open dataset of real modulated signals for wireless communication systems. Specifically, we propose a flexible communication prototype platform for measuring real modulation dataset. Then, based on the measured dataset, two DL-based demodulators, called deep belief network (DBN)-support vec…
▽ More
In this paper, we investigate deep learning (DL)-enabled signal demodulation methods and establish the first open dataset of real modulated signals for wireless communication systems. Specifically, we propose a flexible communication prototype platform for measuring real modulation dataset. Then, based on the measured dataset, two DL-based demodulators, called deep belief network (DBN)-support vector machine (SVM) demodulator and adaptive boosting (AdaBoost) based demodulator, are proposed. The proposed DBN-SVM based demodulator exploits the advantages of both DBN and SVM, i.e., the advantage of DBN as a feature extractor and SVM as a feature classifier. In DBN-SVM based demodulator, the received signals are normalized before being fed to the DBN network. Furthermore, an AdaBoost based demodulator is developed, which employs the $k$-Nearest Neighbor (KNN) as a weak classifier to form a strong combined classifier. Finally, experimental results indicate that the proposed DBN-SVM based demodulator and AdaBoost based demodulator are superior to the single classification method using DBN, SVM, and maximum likelihood (MLD) based demodulator.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications
Authors:
Songtao Lu,
Ioannis Tsaknakis,
Mingyi Hong,
Yongxin Chen
Abstract:
The min-max problem, also known as the saddle point problem, is a class of optimization problems which minimizes and maximizes two subsets of variables simultaneously. This class of problems can be used to formulate a wide range of signal processing and communication (SPCOM) problems. Despite its popularity, most existing theory for this class has been mainly developed for problems with certain sp…
▽ More
The min-max problem, also known as the saddle point problem, is a class of optimization problems which minimizes and maximizes two subsets of variables simultaneously. This class of problems can be used to formulate a wide range of signal processing and communication (SPCOM) problems. Despite its popularity, most existing theory for this class has been mainly developed for problems with certain special convex-concave structure. Therefore, it cannot be used to guide the algorithm design for many interesting problems in SPCOM, where various kinds of non-convexity arise.
In this work, we consider a block-wise one-sided non-convex min-max problem, in which the minimization problem consists of multiple blocks and is non-convex, while the maximization problem is (strongly) concave. We propose a class of simple algorithms named Hybrid Block Successive Approximation (HiBSA), which alternatingly perform gradient descent-type steps for the minimization blocks and gradient ascent-type steps for the maximization problem. A key element in the proposed algorithm is the use of certain regularization and penalty sequences, which stabilize the algorithm and ensure convergence. We show that HiBSA converges to some properly defined first-order stationary solutions with quantifiable global rates. To validate the efficiency of the proposed algorithms, we conduct numerical tests on a number of problems, including the robust learning problem, the non-convex min-utility maximization problems, and certain wireless jamming problem arising in interfering channels.
△ Less
Submitted 16 March, 2021; v1 submitted 21 February, 2019;
originally announced February 2019.