Search | arXiv e-print repository

arXiv:2405.14459 [pdf, other]

Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization

Authors: Ferdinand Genans, Antoine Godichon-Baggioni, François-Xavier Vialard, Olivier Wintenberger

Abstract: Optimal Transport (OT) based distances are powerful tools for machine learning to compare probability measures and manipulate them using OT maps. In this field, a setting of interest is semi-discrete OT, where the source measure $μ$ is continuous, while the target $ν$ is discrete. Recent works have shown that the minimax rate for the OT map is $\mathcal{O}(t^{-1/2})$ when using $t$ i.i.d. subsampl… ▽ More Optimal Transport (OT) based distances are powerful tools for machine learning to compare probability measures and manipulate them using OT maps. In this field, a setting of interest is semi-discrete OT, where the source measure $μ$ is continuous, while the target $ν$ is discrete. Recent works have shown that the minimax rate for the OT map is $\mathcal{O}(t^{-1/2})$ when using $t$ i.i.d. subsamples from each measure (two-sample setting). An open question is whether a better convergence rate can be achieved when the full information of the discrete measure $ν$ is known (one-sample setting). In this work, we answer positively to this question by (i) proving an $\mathcal{O}(t^{-1})$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation, and (ii) proposing a Stochastic Gradient Descent (SGD) algorithm with adaptive entropic regularization and averaging acceleration. To nearly achieve the desired fast rate, characteristic of non-regular parametric problems, we design an entropic regularization scheme decreasing with the number of samples. Another key step in our algorithm consists of using a projection step that permits to leverage the local strong convexity of the regularized OT problem. Our convergence analysis integrates online convex optimization and stochastic gradient techniques, complemented by the specificities of the OT semi-dual. Moreover, while being as computationally and memory efficient as vanilla SGD, our algorithm achieves the unusual fast rates of our theory in numerical experiments. △ Less

Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.01908 [pdf, other]

A Full Adagrad algorithm with O(Nd) operations

Authors: Antoine Godichon-Baggioni, Wei Lu, Bruno Portier

Abstract: A novel approach is given to overcome the computational challenges of the full-matrix Adaptive Gradient algorithm (Full AdaGrad) in stochastic optimization. By developing a recursive method that estimates the inverse of the square root of the covariance of the gradient, alongside a streaming variant for parameter updates, the study offers efficient and practical algorithms for large-scale applicat… ▽ More A novel approach is given to overcome the computational challenges of the full-matrix Adaptive Gradient algorithm (Full AdaGrad) in stochastic optimization. By developing a recursive method that estimates the inverse of the square root of the covariance of the gradient, alongside a streaming variant for parameter updates, the study offers efficient and practical algorithms for large-scale applications. This innovative strategy significantly reduces the complexity and resource demands typically associated with full-matrix methods, enabling more effective optimization processes. Moreover, the convergence rates of the proposed estimators and their asymptotic efficiency are given. Their effectiveness is demonstrated through numerical studies. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.19496 [pdf, other]

Online and Offline Robust Multivariate Linear Regression

Authors: Antoine Godichon-Baggioni, Stephane S. Robin, Laure Sansonnet

Abstract: We consider the robust estimation of the parameters of multivariate Gaussian linear regression models. To this aim we consider robust version of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. We introduce two methods each considered contrast: (i) online stochastic gradient descent algorithms and their averaged versions and (ii) offline fix-point algorithms… ▽ More We consider the robust estimation of the parameters of multivariate Gaussian linear regression models. To this aim we consider robust version of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. We introduce two methods each considered contrast: (i) online stochastic gradient descent algorithms and their averaged versions and (ii) offline fix-point algorithms. Under weak assumptions, we prove the asymptotic normality of the resulting estimates. Because the variance matrix of the noise is usually unknown, we propose to plug a robust estimate of it in the Mahalanobis-based stochastic gradient descent algorithms. We show, on synthetic data, the dramatic gain in terms of robustness of the proposed estimates as compared to the classical least-square ones. Well also show the computational efficiency of the online versions of the proposed algorithms. All the proposed algorithms are implemented in the R package RobRegression available on CRAN. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2402.02857 [pdf, other]

Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation

Authors: Sobihan Surendran, Antoine Godichon-Baggioni, Adeline Fermanian, Sylvain Le Corff

Abstract: Stochastic Gradient Descent (SGD) with adaptive steps is now widely used for training deep neural networks. Most theoretical results assume access to unbiased gradient estimators, which is not the case in several recent deep learning and reinforcement learning applications that use Monte Carlo methods. This paper provides a comprehensive non-asymptotic analysis of SGD with biased gradients and ada… ▽ More Stochastic Gradient Descent (SGD) with adaptive steps is now widely used for training deep neural networks. Most theoretical results assume access to unbiased gradient estimators, which is not the case in several recent deep learning and reinforcement learning applications that use Monte Carlo methods. This paper provides a comprehensive non-asymptotic analysis of SGD with biased gradients and adaptive steps for convex and non-convex smooth functions. Our study incorporates time-dependent bias and emphasizes the importance of controlling the bias and Mean Squared Error (MSE) of the gradient estimator. In particular, we establish that Adagrad and RMSProp with biased gradients converge to critical points for smooth non-convex functions at a rate similar to existing results in the literature for the unbiased case. Finally, we provide experimental results using Variational Autoenconders (VAE) that illustrate our convergence results and show how the effect of bias can be reduced by appropriate hyperparameter tuning. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.10923 [pdf, other]

Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

Authors: Antoine Godichon-Baggioni, Wei Lu, Bruno Portier

Abstract: This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is introduced. This approach enables to drastically reduces computational complexity. Above all, it allows to develop universal stochastic Newton methods and inve… ▽ More This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is introduced. This approach enables to drastically reduces computational complexity. Above all, it allows to develop universal stochastic Newton methods and investigate the asymptotic efficiency of the proposed approach. This work so expands the application scope of secondorder algorithms in stochastic optimization. △ Less

Submitted 4 July, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.09633 [pdf, other]

Natural Gradient Variational Bayes without Fisher Matrix Analytic Calculation and Its Inversion

Authors: A. Godichon-Baggioni, D. Nguyen, M-N Tran

Abstract: This paper introduces a method for efficiently approximating the inverse of the Fisher information matrix, a crucial step in achieving effective variational Bayes inference. A notable aspect of our approach is the avoidance of analytically computing the Fisher information matrix and its explicit inversion. Instead, we introduce an iterative procedure for generating a sequence of matrices that conv… ▽ More This paper introduces a method for efficiently approximating the inverse of the Fisher information matrix, a crucial step in achieving effective variational Bayes inference. A notable aspect of our approach is the avoidance of analytically computing the Fisher information matrix and its explicit inversion. Instead, we introduce an iterative procedure for generating a sequence of matrices that converge to the inverse of Fisher information. The natural gradient variational Bayes algorithm without analytic expression of the Fisher matrix and its inversion is provably convergent and achieves a convergence rate of order O(log s/s), with s the number of iterations. We also obtain a central limit theorem for the iterates. Implementation of our method does not require storage of large matrices, and achieves a linear complexity in the number of variational parameters. Our algorithm exhibits versatility, making it applicable across a diverse array of variational Bayes domains, including Gaussian approximation and normalizing flow Variational Bayes. We offer a range of numerical examples to demonstrate the efficiency and reliability of the proposed variational Bayes method. △ Less

Submitted 26 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 43 pages

arXiv:2311.17753 [pdf, other]

On Adaptive Stochastic Optimization for Streaming Data: A Newton's Method with O(dN) Operations

Authors: Antoine Godichon-Baggioni, Nicklas Werge

Abstract: Stochastic optimization methods encounter new challenges in the realm of streaming, characterized by a continuous flow of large, high-dimensional data. While first-order methods, like stochastic gradient descent, are the natural choice, they often struggle with ill-conditioned problems. In contrast, second-order methods, such as Newton's methods, offer a potential solution, but their computational… ▽ More Stochastic optimization methods encounter new challenges in the realm of streaming, characterized by a continuous flow of large, high-dimensional data. While first-order methods, like stochastic gradient descent, are the natural choice, they often struggle with ill-conditioned problems. In contrast, second-order methods, such as Newton's methods, offer a potential solution, but their computational demands render them impractical. This paper introduces adaptive stochastic optimization methods that bridge the gap between addressing ill-conditioned problems while functioning in a streaming context. Notably, we present an adaptive inversion-free Newton's method with a computational complexity matching that of first-order methods, $\mathcal{O}(dN)$, where $d$ represents the number of dimensions/features, and $N$ the number of data. Theoretical analysis confirms their asymptotic efficiency, and empirical evidence demonstrates their effectiveness, especially in scenarios involving complex covariance structures and challenging initializations. In particular, our adaptive Newton's methods outperform existing methods, while maintaining favorable computational efficiency. △ Less

Submitted 1 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2309.11916 [pdf, other]

A mixture of ellipsoidal densities for 3D data modelling

Authors: Denis Brazey, Antoine Godichon-Baggioni, Bruno Portier

Abstract: In this paper, we propose a new ellipsoidal mixture model. This model is based a new probability density function belonging to the family of elliptical distributions and designed to model points spread around an ellipsoidal surface. Then, we consider a mixture model based on this density, whose parameters are estimated with the help of an EM algorithm. The properties of the estimates are studied t… ▽ More In this paper, we propose a new ellipsoidal mixture model. This model is based a new probability density function belonging to the family of elliptical distributions and designed to model points spread around an ellipsoidal surface. Then, we consider a mixture model based on this density, whose parameters are estimated with the help of an EM algorithm. The properties of the estimates are studied theoretically and empirically. The algorithm is compared to a state of the art ellipse fitting method and experimented on 3D data. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2304.00770 [pdf, other]

Online stochastic Newton methods for estimating the geometric median and applications

Authors: Antoine Godichon-Baggioni, Wei Lu

Abstract: In the context of large samples, a small number of individuals might spoil basic statistical indicators like the mean. It is difficult to detect automatically these atypical individuals, and an alternative strategy is using robust approaches. This paper focuses on estimating the geometric median of a random variable, which is a robust indicator of central tendency. In order to deal with large samp… ▽ More In the context of large samples, a small number of individuals might spoil basic statistical indicators like the mean. It is difficult to detect automatically these atypical individuals, and an alternative strategy is using robust approaches. This paper focuses on estimating the geometric median of a random variable, which is a robust indicator of central tendency. In order to deal with large samples of data arriving sequentially, online stochastic Newton algorithms for estimating the geometric median are introduced and we give their rates of convergence. Since estimates of the median and those of the Hessian matrix can be recursively updated, we also determine confidences intervals of the median in any designated direction and perform online statistical tests. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.01370 [pdf, ps, other]

Non asymptotic analysis of Adaptive stochastic gradient algorithms and applications

Authors: Antoine Godichon-Baggioni, Pierre Tarrago

Abstract: In stochastic optimization, a common tool to deal sequentially with large sample is to consider the well-known stochastic gradient algorithm. Nevertheless, since the stepsequence is the same for each direction, this can lead to bad results in practice in case of ill-conditionned problem. To overcome this, adaptive gradient algorithms such that Adagrad or Stochastic Newton algorithms should be pref… ▽ More In stochastic optimization, a common tool to deal sequentially with large sample is to consider the well-known stochastic gradient algorithm. Nevertheless, since the stepsequence is the same for each direction, this can lead to bad results in practice in case of ill-conditionned problem. To overcome this, adaptive gradient algorithms such that Adagrad or Stochastic Newton algorithms should be prefered. This paper is devoted to the non asymptotic analyis of these adaptive gradient algorithms for strongly convex objective. All the theoretical results will be adapted to linear regression and regularized generalized linear model for both Adagrad and Stochastic Newton algorithms. △ Less

Submitted 1 March, 2023; originally announced March 2023.

arXiv:2211.08131 [pdf, other]

A robust model-based clustering based on the geometric median and the Median Covariation Matrix

Authors: Antoine Godichon-Baggioni, Stéphane Robin

Abstract: Grouping observations into homogeneous groups is a recurrent task in statistical data analysis. We consider Gaussian Mixture Models, which are the most famous parametric model-based clustering method. We propose a new robust approach for model-based clustering, which consists in a modification of the EM algorithm (more specifically, the M-step) by replacing the estimates of the mean and the varian… ▽ More Grouping observations into homogeneous groups is a recurrent task in statistical data analysis. We consider Gaussian Mixture Models, which are the most famous parametric model-based clustering method. We propose a new robust approach for model-based clustering, which consists in a modification of the EM algorithm (more specifically, the M-step) by replacing the estimates of the mean and the variance by robust versions based on the median and the median covariation matrix. All the proposed methods are available in the R package RGMM accessible on CRAN. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2209.03597 [pdf, other]

A penalized criterion for selecting the number of clusters for K-medians

Authors: Antoine Godichon-Baggioni, Sobihan Surendran

Abstract: Clustering is a usual unsupervised machine learning technique for grouping the data points into groups based upon similar features. We focus here on unsupervised clustering for contaminated data, i.e in the case where K-medians should be preferred to K-means because of its robustness. More precisely, we concentrate on a common question in clustering: how to chose the number of clusters? The answer… ▽ More Clustering is a usual unsupervised machine learning technique for grouping the data points into groups based upon similar features. We focus here on unsupervised clustering for contaminated data, i.e in the case where K-medians should be preferred to K-means because of its robustness. More precisely, we concentrate on a common question in clustering: how to chose the number of clusters? The answer proposed here is to consider the choice of the optimal number of clusters as the minimization of a risk function via penalization. In this paper, we obtain a suitable penalty shape for our criterion and derive an associated oracle-type inequality. Finally, the performance of this approach with different types of K-medians algorithms is compared on a simulation study with other popular techniques. All studied algorithms are available in the R package Kmedians on CRAN. △ Less

Submitted 27 February, 2024; v1 submitted 8 September, 2022; originally announced September 2022.

arXiv:2205.12549 [pdf, other]

Learning from time-dependent streaming data with online stochastic algorithms

Authors: Antoine Godichon-Baggioni, Nicklas Werge, Olivier Wintenberger

Abstract: This paper addresses stochastic optimization in a streaming setting with time-dependent and biased gradient estimates. We analyze several first-order methods, including Stochastic Gradient Descent (SGD), mini-batch SGD, and time-varying mini-batch SGD, along with their Polyak-Ruppert averages. Our non-asymptotic analysis establishes novel heuristics that link dependence, biases, and convexity leve… ▽ More This paper addresses stochastic optimization in a streaming setting with time-dependent and biased gradient estimates. We analyze several first-order methods, including Stochastic Gradient Descent (SGD), mini-batch SGD, and time-varying mini-batch SGD, along with their Polyak-Ruppert averages. Our non-asymptotic analysis establishes novel heuristics that link dependence, biases, and convexity levels, enabling accelerated convergence. Specifically, our findings demonstrate that (i) time-varying mini-batch SGD methods have the capability to break long- and short-range dependence structures, (ii) biased SGD methods can achieve comparable performance to their unbiased counterparts, and (iii) incorporating Polyak-Ruppert averaging can accelerate the convergence of the stochastic optimization algorithms. To validate our theoretical findings, we conduct a series of experiments using both simulated and real-life time-dependent data. △ Less

Submitted 18 July, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2109.07117 [pdf, other]

doi 10.1051/ps/2023006

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data

Authors: Antoine Godichon-Baggioni, Nicklas Werge, Olivier Wintenberger

Abstract: We introduce a streaming framework for analyzing stochastic approximation/optimization problems. This streaming framework is analogous to solving optimization problems using time-varying mini-batches that arrive sequentially. We provide non-asymptotic convergence rates of various gradient-based algorithms; this includes the famous Stochastic Gradient (SG) descent (a.k.a. Robbins-Monro algorithm),… ▽ More We introduce a streaming framework for analyzing stochastic approximation/optimization problems. This streaming framework is analogous to solving optimization problems using time-varying mini-batches that arrive sequentially. We provide non-asymptotic convergence rates of various gradient-based algorithms; this includes the famous Stochastic Gradient (SG) descent (a.k.a. Robbins-Monro algorithm), mini-batch SG and time-varying mini-batch SG algorithms, as well as their iterated averages (a.k.a. Polyak-Ruppert averaging). We show i) how to accelerate convergence by choosing the learning rate according to the time-varying mini-batches, ii) that Polyak-Ruppert averaging achieves optimal convergence in terms of attaining the Cramer-Rao lower bound, and iii) how time-varying mini-batches together with Polyak-Ruppert averaging can provide variance reduction and accelerate convergence simultaneously, which is advantageous for many learning problems, such as online, sequential, and large-scale learning. We further demonstrate these favorable effects for various time-varying mini-batches. △ Less

Submitted 24 April, 2023; v1 submitted 15 September, 2021; originally announced September 2021.

arXiv:2107.12058 [pdf, ps, other]

Convergence in quadratic mean of averaged stochastic gradient algorithms without strong convexity nor bounded gradient

Authors: Antoine Godichon-Baggioni

Abstract: Online averaged stochastic gradient algorithms are more and more studied since (i) they can deal quickly with large sample taking values in high dimensional spaces, (ii) they enable to treat data sequentially, (iii) they are known to be asymptotically efficient. In this paper, we focus on giving explicit bounds of the quadratic mean error of the estimates, and this, with very weak assumptions, i.e… ▽ More Online averaged stochastic gradient algorithms are more and more studied since (i) they can deal quickly with large sample taking values in high dimensional spaces, (ii) they enable to treat data sequentially, (iii) they are known to be asymptotically efficient. In this paper, we focus on giving explicit bounds of the quadratic mean error of the estimates, and this, with very weak assumptions, i.e without supposing that the function we would like to minimize is strongly convex or admits a bounded gradient. △ Less

Submitted 26 July, 2021; originally announced July 2021.

arXiv:2011.09706 [pdf, other]

doi 10.1007/s10589-022-00442-3

On the asymptotic rate of convergence of Stochastic Newton algorithms and their Weighted Averaged versions

Authors: Claire Boyer, Antoine Godichon-Baggioni

Abstract: The majority of machine learning methods can be regarded as the minimization of an unavailable risk function. To optimize the latter, given samples provided in a streaming fashion, we define a general stochastic Newton algorithm and its weighted average version. In several use cases, both implementations will be shown not to require the inversion of a Hessian estimate at each iteration, but a dire… ▽ More The majority of machine learning methods can be regarded as the minimization of an unavailable risk function. To optimize the latter, given samples provided in a streaming fashion, we define a general stochastic Newton algorithm and its weighted average version. In several use cases, both implementations will be shown not to require the inversion of a Hessian estimate at each iteration, but a direct update of the estimate of the inverse Hessian instead will be favored. This generalizes a trick introduced in [2] for the specific case of logistic regression, by directly updating the estimate of the inverse Hessian. Under mild assumptions such as local strong convexity at the optimum, we establish almost sure convergences and rates of convergence of the algorithms, as well as central limit theorems for the constructed parameter estimates. The unified framework considered in this paper covers the case of linear, logistic or softmax regressions to name a few. Numerical experiments on simulated data give the empirical evidence of the pertinence of the proposed methods, which outperform popular competitors particularly in case of bad initializa-tions. △ Less

Submitted 29 June, 2023; v1 submitted 19 November, 2020; originally announced November 2020.

Comments: Computational Optimization and Applications, 2022

arXiv:2006.12920 [pdf, other]

An efficient Averaged Stochastic Gauss-Newton algorithm for estimating parameters of non linear regressions models

Authors: Peggy Cénac, Antoine Godichon-Baggioni, Bruno Portier

Abstract: Non linear regression models are a standard tool for modeling real phenomena, with several applications in machine learning, ecology, econometry... Estimating the parameters of the model has garnered a lot of attention during many years. We focus here on a recursive method for estimating parameters of non linear regressions. Indeed, these kinds of methods, whose most famous are probably the stocha… ▽ More Non linear regression models are a standard tool for modeling real phenomena, with several applications in machine learning, ecology, econometry... Estimating the parameters of the model has garnered a lot of attention during many years. We focus here on a recursive method for estimating parameters of non linear regressions. Indeed, these kinds of methods, whose most famous are probably the stochastic gradient algorithm and its averaged version, enable to deal efficiently with massive data arriving sequentially. Nevertheless, they can be, in practice, very sensitive to the case where the eigen-values of the Hessian of the functional we would like to minimize are at different scales. To avoid this problem, we first introduce an online Stochastic Gauss-Newton algorithm. In order to improve the estimates behavior in case of bad initialization, we also introduce a new Averaged Stochastic Gauss-Newton algorithm and prove its asymptotic efficiency. △ Less

Submitted 16 September, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

arXiv:1904.07908 [pdf, other]

An efficient stochastic Newton algorithm for parameter estimation in logistic regressions

Authors: Bernard Bercu, Antoine Godichon-Baggioni, Bruno Portier

Abstract: Logistic regression is a well-known statistical model which is commonly used in the situation where the output is a binary random variable. It has a wide range of applications including machine learning, public health, social sciences, ecology and econometry. In order to estimate the unknown parameters of logistic regression with data streams arriving sequentially and at high speed, we focus our a… ▽ More Logistic regression is a well-known statistical model which is commonly used in the situation where the output is a binary random variable. It has a wide range of applications including machine learning, public health, social sciences, ecology and econometry. In order to estimate the unknown parameters of logistic regression with data streams arriving sequentially and at high speed, we focus our attention on a recursive stochastic algorithm. More precisely, we investigate the asymptotic behavior of a new stochastic Newton algorithm. It enables to easily update the estimates when the data arrive sequentially and to have research steps in all directions. We establish the almost sure convergence of our stochastic Newton algorithm as well as its asymptotic normality. All our theoretical results are illustrated by numerical experiments. △ Less

Submitted 16 April, 2019; originally announced April 2019.

arXiv:1710.07926 [pdf, other]

On the rates of convergence of Parallelized Averaged Stochastic Gradient Algorithms

Authors: Antoine Godichon-Baggioni, Sofiane Saadane

Abstract: The growing interest for high dimensional and functional data analysis led in the last decade to an important research developing a consequent amount of techniques. Parallelized algorithms, which consist in distributing and treat the data into different machines, for example, are a good answer to deal with large samples taking values in high dimensional spaces. We introduce here a parallelized ave… ▽ More The growing interest for high dimensional and functional data analysis led in the last decade to an important research developing a consequent amount of techniques. Parallelized algorithms, which consist in distributing and treat the data into different machines, for example, are a good answer to deal with large samples taking values in high dimensional spaces. We introduce here a parallelized averaged stochastic gradient algorithm, which enables to treat efficiently and recursively the data, and so, without taking care if the distribution of the data into the machines is uniform. The rate of convergence in quadratic mean as well as the asymptotic normality of the parallelized estimates are given, for strongly and locally strongly convex objectives. △ Less

Submitted 22 October, 2017; originally announced October 2017.

arXiv:1704.06150 [pdf, other]

doi 10.1080/02664763.2018.1454894

Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data

Authors: Antoine Godichon-Baggioni, Cathy Maugis-Rabusseau, Andrea Rau

Abstract: Although there is no shortage of clustering algorithms proposed in the literature, the question of the most relevant strategy for clustering compositional data (i.e., data made up of profiles, whose rows belong to the simplex) remains largely unexplored in cases where the observed value of an observation is equal or close to zero for one or more samples. This work is motivated by the analysis of t… ▽ More Although there is no shortage of clustering algorithms proposed in the literature, the question of the most relevant strategy for clustering compositional data (i.e., data made up of profiles, whose rows belong to the simplex) remains largely unexplored in cases where the observed value of an observation is equal or close to zero for one or more samples. This work is motivated by the analysis of two sets of compositional data, both focused on the categorization of profiles but arising from considerably different applications: (1) identifying groups of co-expressed genes from high-throughput RNA sequencing data, in which a given gene may be completely silent in one or more experimental conditions; and (2) finding patterns in the usage of stations over the course of one week in the Velib' bicycle sharing system in Paris, France. For both of these applications, we focus on the use of appropriately chosen data transformations, including the Centered Log Ratio and a novel extension we propose called the Log Centered Log Ratio, in conjunction with the K-means algorithm. We use a nonasymptotic penalized criterion, whose penalty is calibrated with the slope heuristics, to select the number of clusters present in the data. Finally, we illustrate the performance of this clustering strategy, which is implemented in the Bioconductor package coseq, on both the gene expression and bicycle sharing system data. △ Less

Submitted 20 April, 2017; originally announced April 2017.

MSC Class: 62H30; 62P10

arXiv:1702.00931 [pdf, other]

Online estimation of the asymptotic variance for averaged stochastic gradient algorithms

Authors: Antoine Godichon-Baggioni

Abstract: Stochastic gradient algorithms are more and more studied since they can deal efficiently and online with large samples in high dimensional spaces. In this paper, we first establish a Central Limit Theorem for these estimates as well as for their averaged version in general Hilbert spaces. Moreover, since having the asymptotic normality of estimates is often unusable without an estimation of the as… ▽ More Stochastic gradient algorithms are more and more studied since they can deal efficiently and online with large samples in high dimensional spaces. In this paper, we first establish a Central Limit Theorem for these estimates as well as for their averaged version in general Hilbert spaces. Moreover, since having the asymptotic normality of estimates is often unusable without an estimation of the asymptotic variance, we introduce a new recursive algorithm for estimating this last one, and we establish its almost sure rate of convergence as well as its rate of convergence in quadratic mean. Finally, two examples consisting in estimating the parameters of the logistic regression and estimating geometric quantiles are given. △ Less

Submitted 16 October, 2017; v1 submitted 3 February, 2017; originally announced February 2017.

arXiv:1609.05479 [pdf, other]

Lp and almost sure rates of convergence of averaged stochastic gradient algorithms: locally strongly convex objective

Authors: Antoine Godichon-Baggioni

Abstract: An usual problem in statistics consists in estimating the minimizer of a convex function. When we have to deal with large samples taking values in high dimensional spaces, stochastic gradient algorithms and their averaged versions are efficient candidates. Indeed, (1) they do not need too much computational efforts, (2) they do not need to store all the data, which is crucial when we deal with big… ▽ More An usual problem in statistics consists in estimating the minimizer of a convex function. When we have to deal with large samples taking values in high dimensional spaces, stochastic gradient algorithms and their averaged versions are efficient candidates. Indeed, (1) they do not need too much computational efforts, (2) they do not need to store all the data, which is crucial when we deal with big data, (3) they allow to simply update the estimates, which is important when data arrive sequentially. The aim of this work is to give asymptotic and non asymptotic rates of convergence of stochastic gradient estimates as well as of their averaged versions when the function we would like to minimize is only locally strongly convex. △ Less

Submitted 11 January, 2022; v1 submitted 18 September, 2016; originally announced September 2016.

arXiv:1606.04276 [pdf, other]

An averaged projected Robbins-Monro algorithm for estimating the parameters of a truncated spherical distribution

Authors: Antoine Godichon-Baggioni, Bruno Portier

Abstract: The objective of this work is to propose a new algorithm to fit a sphere on a noisy 3D point cloud distributed around a complete or a truncated sphere. More precisely, we introduce a projected Robbins-Monro algorithm and its averaged version for estimating the center and the radius of the sphere. We give asymptotic results such as the almost sure convergence of these algorithms as well as the asym… ▽ More The objective of this work is to propose a new algorithm to fit a sphere on a noisy 3D point cloud distributed around a complete or a truncated sphere. More precisely, we introduce a projected Robbins-Monro algorithm and its averaged version for estimating the center and the radius of the sphere. We give asymptotic results such as the almost sure convergence of these algorithms as well as the asymptotic normality of the averaged algorithm. Furthermore, some non-asymptotic results will be given, such as the rates of convergence in quadratic mean. Some numerical experiments show the efficiency of the proposed algorithm on simulated data for small to moderate sample sizes. △ Less

Submitted 14 June, 2016; originally announced June 2016.

arXiv:1504.02852 [pdf, other]

Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis

Authors: Hervé Cardot, Antoine Godichon-Baggioni

Abstract: The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic con… ▽ More The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence properties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicator is a competitive alternative to minimum covariance determinant when the dimension of the data is small and robust principal components analysis based on projection pursuit and spherical projections for high dimension data. An illustration on a large sample and high dimensional dataset consisting of individual TV audiences measured at a minute scale over a period of 24 hours confirms the interest of considering the robust principal components analysis based on the median covariation matrix. All studied algorithms are available in the R package Gmedian on CRAN. △ Less

Submitted 9 July, 2016; v1 submitted 11 April, 2015; originally announced April 2015.

Showing 1–24 of 24 results for author: Godichon-Baggioni, A