-
Unmixing Noise from Hawkes Process to Model Learned Physiological Events
Authors:
Guillaume Staerman,
Virginie Loison,
Thomas Moreau
Abstract:
Physiological signal analysis often involves identifying events crucial to understanding biological dynamics. Traditional methods rely on handcrafted procedures or supervised learning, presenting challenges such as expert dependence, lack of robustness, and the need for extensive labeled data. Data-driven methods like Convolutional Dictionary Learning (CDL) offer an alternative but tend to produce…
▽ More
Physiological signal analysis often involves identifying events crucial to understanding biological dynamics. Traditional methods rely on handcrafted procedures or supervised learning, presenting challenges such as expert dependence, lack of robustness, and the need for extensive labeled data. Data-driven methods like Convolutional Dictionary Learning (CDL) offer an alternative but tend to produce spurious detections. This work introduces UNHaP (Unmix Noise from Hawkes Processes), a novel approach addressing the joint learning of temporal structures in events and the removal of spurious detections. Leveraging marked Hawkes processes, UNHaP distinguishes between events of interest and spurious ones. By treating the event detection output as a mixture of structured and unstructured events, UNHaP efficiently unmixes these processes and estimates their parameters. This approach significantly enhances the understanding of event distributions while minimizing false detection rates.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Flexible Parametric Inference for Space-Time Hawkes Processes
Authors:
Emilia Siviero,
Guillaume Staerman,
Stephan Clémençon,
Thomas Moreau
Abstract:
Many modern spatio-temporal data sets, in sociology, epidemiology or seismology, for example, exhibit self-exciting characteristics, triggering and clustering behaviors both at the same time, that a suitable Hawkes space-time process can accurately capture. This paper aims to develop a fast and flexible parametric inference technique to recover the parameters of the kernel functions involved in th…
▽ More
Many modern spatio-temporal data sets, in sociology, epidemiology or seismology, for example, exhibit self-exciting characteristics, triggering and clustering behaviors both at the same time, that a suitable Hawkes space-time process can accurately capture. This paper aims to develop a fast and flexible parametric inference technique to recover the parameters of the kernel functions involved in the intensity function of a space-time Hawkes process based on such data. Our statistical approach combines three key ingredients: 1) kernels with finite support are considered, 2) the space-time domain is appropriately discretized, and 3) (approximate) precomputations are used. The inference technique we propose then consists of a $\ell_2$ gradient-based solver that is fast and statistically accurate. In addition to describing the algorithmic aspects, numerical experiments have been carried out on synthetic and real spatio-temporal data, providing solid empirical evidence of the relevance of the proposed methodology.
△ Less
Submitted 17 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Signature Isolation Forest
Authors:
Guillaume Staerman,
Marta Campi,
Gareth W. Peters
Abstract:
Functional Isolation Forest (FIF) is a recent state-of-the-art Anomaly Detection (AD) algorithm designed for functional data. It relies on a tree partition procedure where an abnormality score is computed by projecting each curve observation on a drawn dictionary through a linear inner product. Such linear inner product and the dictionary are a priori choices that highly influence the algorithm's…
▽ More
Functional Isolation Forest (FIF) is a recent state-of-the-art Anomaly Detection (AD) algorithm designed for functional data. It relies on a tree partition procedure where an abnormality score is computed by projecting each curve observation on a drawn dictionary through a linear inner product. Such linear inner product and the dictionary are a priori choices that highly influence the algorithm's performances and might lead to unreliable results, particularly with complex datasets. This work addresses these challenges by introducing \textit{Signature Isolation Forest}, a novel AD algorithm class leveraging the rough path theory's signature transform. Our objective is to remove the constraints imposed by FIF through the proposition of two algorithms which specifically target the linearity of the FIF inner product and the choice of the dictionary. We provide several numerical experiments, including a real-world applications benchmark showing the relevance of our methods.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation
Authors:
Anas Himmi,
Guillaume Staerman,
Marine Picot,
Pierre Colombo,
Nuno M. Guerreiro
Abstract:
Hallucinated translations pose significant threats and safety concerns when it comes to the practical deployment of machine translation systems. Previous research works have identified that detectors exhibit complementary performance different detectors excel at detecting different types of hallucinations. In this paper, we propose to address the limitations of individual detectors by combining th…
▽ More
Hallucinated translations pose significant threats and safety concerns when it comes to the practical deployment of machine translation systems. Previous research works have identified that detectors exhibit complementary performance different detectors excel at detecting different types of hallucinations. In this paper, we propose to address the limitations of individual detectors by combining them and introducing a straightforward method for aggregating multiple detectors. Our results demonstrate the efficacy of our aggregated detector, providing a promising step towards evermore reliable machine translation systems.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Toward Stronger Textual Attack Detectors
Authors:
Pierre Colombo,
Marine Picot,
Nathan Noiry,
Guillaume Staerman,
Pablo Piantanida
Abstract:
The landscape of available textual adversarial attacks keeps growing, posing severe threats and raising concerns regarding the deep NLP system's integrity. However, the crucial problem of defending against malicious attacks has only drawn the attention of the NLP community. The latter is nonetheless instrumental in developing robust and trustworthy systems. This paper makes two important contribut…
▽ More
The landscape of available textual adversarial attacks keeps growing, posing severe threats and raising concerns regarding the deep NLP system's integrity. However, the crucial problem of defending against malicious attacks has only drawn the attention of the NLP community. The latter is nonetheless instrumental in developing robust and trustworthy systems. This paper makes two important contributions in this line of search: (i) we introduce LAROUSSE, a new framework to detect textual adversarial attacks and (ii) we introduce STAKEOUT, a new benchmark composed of nine popular attack methods, three datasets, and two pre-trained models. LAROUSSE is ready-to-use in production as it is unsupervised, hyperparameter-free, and non-differentiable, protecting it against gradient-based methods. Our new benchmark STAKEOUT allows for a robust evaluation framework: we conduct extensive numerical experiments which demonstrate that LAROUSSE outperforms previous methods, and which allows to identify interesting factors of detection rate variations.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification
Authors:
Pierre Colombo,
Nathan Noiry,
Guillaume Staerman,
Pablo Piantanida
Abstract:
One of the pursued objectives of deep learning is to provide tools that learn abstract representations of reality from the observation of multiple contextual situations. More precisely, one wishes to extract disentangled representations which are (i) low dimensional and (ii) whose components are independent and correspond to concepts capturing the essence of the objects under consideration (Locate…
▽ More
One of the pursued objectives of deep learning is to provide tools that learn abstract representations of reality from the observation of multiple contextual situations. More precisely, one wishes to extract disentangled representations which are (i) low dimensional and (ii) whose components are independent and correspond to concepts capturing the essence of the objects under consideration (Locatello et al., 2019b). One step towards this ambitious project consists in learning disentangled representations with respect to a predefined (sensitive) attribute, e.g., the gender or age of the writer. Perhaps one of the main application for such disentangled representations is fair classification. Existing methods extract the last layer of a neural network trained with a loss that is composed of a cross-entropy objective and a disentanglement regularizer. In this work, we adopt an information-theoretic view of this problem which motivates a novel family of regularizers that minimizes the mutual information between the latent representation and the sensitive attribute conditional to the target. The resulting set of losses, called CLINIC, is parameter free and thus, it is easier and faster to train. CLINIC losses are studied through extensive numerical experiments by training over 2k neural networks. We demonstrate that our methods offer a better disentanglement/accuracy trade-off than previous techniques, and generalize better than training with cross-entropy loss solely provided that the disentanglement task is not too constraining.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
A Functional Data Perspective and Baseline On Multi-Layer Out-of-Distribution Detection
Authors:
Eduardo Dadalto,
Pierre Colombo,
Guillaume Staerman,
Nathan Noiry,
Pablo Piantanida
Abstract:
A key feature of out-of-distribution (OOD) detection is to exploit a trained neural network by extracting statistical patterns and relationships through the multi-layer classifier to detect shifts in the expected input data distribution. Despite achieving solid results, several state-of-the-art methods rely on the penultimate or last layer outputs only, leaving behind valuable information for OOD…
▽ More
A key feature of out-of-distribution (OOD) detection is to exploit a trained neural network by extracting statistical patterns and relationships through the multi-layer classifier to detect shifts in the expected input data distribution. Despite achieving solid results, several state-of-the-art methods rely on the penultimate or last layer outputs only, leaving behind valuable information for OOD detection. Methods that explore the multiple layers either require a special architecture or a supervised objective to do so. This work adopts an original approach based on a functional view of the network that exploits the sample's trajectories through the various layers and their statistical dependencies. It goes beyond multivariate features aggregation and introduces a baseline rooted in functional anomaly detection. In this new framework, OOD detection translates into detecting samples whose trajectories differ from the typical behavior characterized by the training set. We validate our method and empirically demonstrate its effectiveness in OOD detection compared to strong state-of-the-art baselines on computer vision benchmarks.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability
Authors:
Anass Aghbalou,
Guillaume Staerman
Abstract:
Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target, without requiring access to the source data. Indeed, HTL relies only on a hypothesis learnt from such source data, relieving the hurdle of expansive data storage and providing great practical benefits. Hence, HTL is highly beneficial for real-world…
▽ More
Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target, without requiring access to the source data. Indeed, HTL relies only on a hypothesis learnt from such source data, relieving the hurdle of expansive data storage and providing great practical benefits. Hence, HTL is highly beneficial for real-world applications relying on big data. The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis. In particular, we are interested in the statistical behaviour of the regularized empirical risk minimizers in the case of binary classification. Our stability analysis provides learning guarantees under mild assumptions. Consequently, we derive several complexity-free generalization bounds for essential statistical quantities like the training error, the excess risk and cross-validation estimates. These refined bounds allow understanding the benefits of transfer learning and comparing the behaviour of standard losses in different scenarios, leading to valuable insights for practitioners.
△ Less
Submitted 14 July, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Unsupervised Layer-wise Score Aggregation for Textual OOD Detection
Authors:
Maxime Darrin,
Guillaume Staerman,
Eduardo Dadalto Câmara Gomes,
Jackie CK Cheung,
Pablo Piantanida,
Pierre Colombo
Abstract:
Out-of-distribution (OOD) detection is a rapidly growing field due to new robustness and security requirements driven by an increased number of AI-based systems. Existing OOD textual detectors often rely on an anomaly score (e.g., Mahalanobis distance) computed on the embedding output of the last layer of the encoder. In this work, we observe that OOD detection performance varies greatly depending…
▽ More
Out-of-distribution (OOD) detection is a rapidly growing field due to new robustness and security requirements driven by an increased number of AI-based systems. Existing OOD textual detectors often rely on an anomaly score (e.g., Mahalanobis distance) computed on the embedding output of the last layer of the encoder. In this work, we observe that OOD detection performance varies greatly depending on the task and layer output. More importantly, we show that the usual choice (the last layer) is rarely the best one for OOD detection and that far better results could be achieved if the best layer were picked. To leverage this observation, we propose a data-driven, unsupervised method to combine layer-wise anomaly scores. In addition, we extend classical textual OOD benchmarks by including classification tasks with a greater number of classes (up to 77), which reflects more realistic settings. On this augmented benchmark, we show that the proposed post-aggregation methods achieve robust and consistent results while removing manual feature selection altogether. Their performance achieves near oracle's best layer performance.
△ Less
Submitted 21 February, 2024; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Beyond Mahalanobis-Based Scores for Textual OOD Detection
Authors:
Pierre Colombo,
Eduardo D. C. Gomes,
Guillaume Staerman,
Nathan Noiry,
Pablo Piantanida
Abstract:
Deep learning methods have boosted the adoption of NLP systems in real-life applications. However, they turn out to be vulnerable to distribution shifts over time which may cause severe dysfunctions in production systems, urging practitioners to develop tools to detect out-of-distribution (OOD) samples through the lens of the neural network. In this paper, we introduce TRUSTED, a new OOD detector…
▽ More
Deep learning methods have boosted the adoption of NLP systems in real-life applications. However, they turn out to be vulnerable to distribution shifts over time which may cause severe dysfunctions in production systems, urging practitioners to develop tools to detect out-of-distribution (OOD) samples through the lens of the neural network. In this paper, we introduce TRUSTED, a new OOD detector for classifiers based on Transformer architectures that meets operational requirements: it is unsupervised and fast to compute. The efficiency of TRUSTED relies on the fruitful idea that all hidden layers carry relevant information to detect OOD examples. Based on this, for a given input, TRUSTED consists in (i) aggregating this information and (ii) computing a similarity score by exploiting the training distribution, leveraging the powerful concept of data depth. Our extensive numerical experiments involve 51k model configurations, including various checkpoints, seeds, and datasets, and demonstrate that TRUSTED achieves state-of-the-art performances. In particular, it improves previous AUROC over 3 points.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels
Authors:
Guillaume Staerman,
Cédric Allain,
Alexandre Gramfort,
Thomas Moreau
Abstract:
Temporal point processes (TPP) are a natural tool for modeling event-based data. Among all TPP models, Hawkes processes have proven to be the most widely used, mainly due to their adequate modeling for various applications, particularly when considering exponential or non-parametric kernels. Although non-parametric kernels are an option, such models require large datasets. While exponential kernel…
▽ More
Temporal point processes (TPP) are a natural tool for modeling event-based data. Among all TPP models, Hawkes processes have proven to be the most widely used, mainly due to their adequate modeling for various applications, particularly when considering exponential or non-parametric kernels. Although non-parametric kernels are an option, such models require large datasets. While exponential kernels are more data efficient and relevant for specific applications where events immediately trigger more events, they are ill-suited for applications where latencies need to be estimated, such as in neuroscience. This work aims to offer an efficient solution to TPP inference using general parametric kernels with finite support. The developed solution consists of a fast $\ell_2$ gradient-based solver leveraging a discretized version of the events. After theoretically supporting the use of discretization, the statistical and computational efficiency of the novel approach is demonstrated through various numerical experiments. Finally, the method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG). Given the use of general parametric kernels, results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
△ Less
Submitted 2 August, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Learning Disentangled Textual Representations via Statistical Measures of Similarity
Authors:
Pierre Colombo,
Guillaume Staerman,
Nathan Noiry,
Pablo Piantanida
Abstract:
When working with textual data, a natural application of disentangled representations is fair classification where the goal is to make predictions without being biased (or influenced) by sensitive attributes that may be present in the data (e.g., age, gender or race). Dominant approaches to disentangle a sensitive attribute from textual representations rely on learning simultaneously a penalizatio…
▽ More
When working with textual data, a natural application of disentangled representations is fair classification where the goal is to make predictions without being biased (or influenced) by sensitive attributes that may be present in the data (e.g., age, gender or race). Dominant approaches to disentangle a sensitive attribute from textual representations rely on learning simultaneously a penalization term that involves either an adversarial loss (e.g., a discriminator) or an information measure (e.g., mutual information). However, these methods require the training of a deep neural network with several parameter updates for each update of the representation model. As a matter of fact, the resulting nested optimization loop is both time consuming, adding complexity to the optimization dynamic, and requires a fine hyperparameter selection (e.g., learning rates, architecture). In this work, we introduce a family of regularizers for learning disentangled representations that do not require training. These regularizers are based on statistical measures of similarity between the conditional probability distributions with respect to the sensitive attributes. Our novel regularizers do not require additional training, are faster and do not involve additional tuning while achieving better results both when combined with pretrained and randomly initialized text encoders.
△ Less
Submitted 7 October, 2022; v1 submitted 7 May, 2022;
originally announced May 2022.
-
Functional Anomaly Detection: a Benchmark Study
Authors:
Guillaume Staerman,
Eric Adjakossa,
Pavlo Mozharovskyi,
Vera Hofer,
Jayant Sen Gupta,
Stephan Clémençon
Abstract:
The increasing automation in many areas of the Industry expressly demands to design efficient machine-learning solutions for the detection of abnormal events. With the ubiquitous deployment of sensors monitoring nearly continuously the health of complex infrastructures, anomaly detection can now rely on measurements sampled at a very high frequency, providing a very rich representation of the phen…
▽ More
The increasing automation in many areas of the Industry expressly demands to design efficient machine-learning solutions for the detection of abnormal events. With the ubiquitous deployment of sensors monitoring nearly continuously the health of complex infrastructures, anomaly detection can now rely on measurements sampled at a very high frequency, providing a very rich representation of the phenomenon under surveillance. In order to exploit fully the information thus collected, the observations cannot be treated as multivariate data anymore and a functional analysis approach is required. It is the purpose of this paper to investigate the performance of recent techniques for anomaly detection in the functional setup on real datasets. After an overview of the state-of-the-art and a visual-descriptive study, a variety of anomaly detection methods are compared. While taxonomies of abnormalities (e.g. shape, location) in the functional setup are documented in the literature, assigning a specific type to the identified anomalies appears to be a challenging task. Thus, strengths and weaknesses of the existing approaches are benchmarked in view of these highlighted types in a simulation study. Anomaly detection methods are next evaluated on two datasets, related to the monitoring of helicopters in flight and to the spectrometry of construction materials namely. The benchmark analysis is concluded by recommendation guidance for practitioners.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Automatic Text Evaluation through the Lens of Wasserstein Barycenters
Authors:
Pierre Colombo,
Guillaume Staerman,
Chloe Clavel,
Pablo Piantanida
Abstract:
A new metric \texttt{BaryScore} to evaluate text generation based on deep contextualized embeddings e.g., BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on optimal transport tools, i.e., Wasserstein distance and barycenter. By modelling the layer output of deep contextualized embeddings as a probability distribution rather than by a vector embedding; this f…
▽ More
A new metric \texttt{BaryScore} to evaluate text generation based on deep contextualized embeddings e.g., BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on optimal transport tools, i.e., Wasserstein distance and barycenter. By modelling the layer output of deep contextualized embeddings as a probability distribution rather than by a vector embedding; this framework provides a natural way to aggregate the different outputs through the Wasserstein space topology. In addition, it provides theoretical grounds to our metric and offers an alternative to available solutions e.g., MoverScore and BertScore). Numerical evaluation is performed on four different tasks: machine translation, summarization, data2text generation and image captioning. Our results show that \texttt{BaryScore} outperforms other BERT based metrics and exhibits more consistent behaviour in particular for text summarization.
△ Less
Submitted 9 September, 2021; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis
Authors:
Guillaume Staerman,
Pavlo Mozharovskyi,
Stéphan Clémençon
Abstract:
Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e.g. inference, hypothesis testing). Whereas many depth functions have been proposed \textit{ad-hoc} in the literature since the seminal contribution of \cite{Tukey7…
▽ More
Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e.g. inference, hypothesis testing). Whereas many depth functions have been proposed \textit{ad-hoc} in the literature since the seminal contribution of \cite{Tukey75}, not all of them possess the properties desirable to emulate the notion of quantile function for univariate probability distributions. In this paper, we propose an extension of the \textit{integrated rank-weighted} statistical depth (IRW depth in abbreviated form) originally introduced in \cite{IRW}, modified in order to satisfy the property of \textit{affine-invariance}, fulfilling thus all the four key axioms listed in the nomenclature elaborated by \cite{ZuoS00a}. The variant we propose, referred to as the Affine-Invariant IRW depth (AI-IRW in short), involves the covariance/precision matrices of the (supposedly square integrable) $d$-dimensional random vector $X$ under study, in order to take into account the directions along which $X$ is most variable to assign a depth value to any point $x\in \mathbb{R}^d$. The accuracy of the sampling version of the AI-IRW depth is investigated from a nonasymptotic perspective. Namely, a concentration result for the statistical counterpart of the AI-IRW depth is proved. Beyond the theoretical analysis carried out, applications to anomaly detection are considered and numerical results are displayed, providing strong empirical evidence of the relevance of the depth function we propose here.
△ Less
Submitted 4 February, 2022; v1 submitted 21 June, 2021;
originally announced June 2021.
-
A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions
Authors:
Guillaume Staerman,
Pavlo Mozharovskyi,
Pierre Colombo,
Stéphan Clémençon,
Florence d'Alché-Buc
Abstract:
The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametri…
▽ More
The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametric statistical tool that measures the centrality of any element $x\in\mathbb{R}^d$ with respect to (w.r.t.) a probability distribution or a data set. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Thus, its upper-level sets -- the depth-trimmed regions -- give rise to a definition of multivariate quantiles. The new pseudo-metric relies on the average of the Hausdorff distance between the depth-based quantile regions w.r.t. each distribution. Its good behavior w.r.t. major transformation groups, as well as its ability to factor out translations, are depicted. Robustness, an appealing feature of this pseudo-metric, is studied through the finite sample breakdown point. Moreover, we propose an efficient approximation method with linear time complexity w.r.t. the size of the data set and its dimension. The quality of this approximation as well as the performance of the proposed approach are illustrated in numerical experiments.
△ Less
Submitted 10 October, 2022; v1 submitted 23 March, 2021;
originally announced March 2021.
-
When OT meets MoM: Robust estimation of Wasserstein Distance
Authors:
Guillaume Staerman,
Pierre Laforgue,
Pavlo Mozharovskyi,
Florence d'Alché-Buc
Abstract:
Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to lev…
▽ More
Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage Medians of Means (MoM) estimators to robustify the estimation of Wasserstein distance. Exploiting the dual Kantorovitch formulation of Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. These MoM estimators enable to make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST. Eventually, we discuss how to combine MoM with the entropy-regularized approximation of the Wasserstein distance and propose a simple MoM-based re-weighting scheme that could be used in conjunction with the Sinkhorn algorithm.
△ Less
Submitted 18 February, 2022; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Generalization Bounds in the Presence of Outliers: a Median-of-Means Study
Authors:
Pierre Laforgue,
Guillaume Staerman,
Stephan Clémençon
Abstract:
In contrast to the empirical mean, the Median-of-Means (MoM) is an estimator of the mean $θ$ of a square integrable r.v. $Z$, around which accurate nonasymptotic confidence bounds can be built, even when $Z$ does not exhibit a sub-Gaussian tail behavior. Thanks to the high confidence it achieves on heavy-tailed data, MoM has found various applications in machine learning, where it is used to desig…
▽ More
In contrast to the empirical mean, the Median-of-Means (MoM) is an estimator of the mean $θ$ of a square integrable r.v. $Z$, around which accurate nonasymptotic confidence bounds can be built, even when $Z$ does not exhibit a sub-Gaussian tail behavior. Thanks to the high confidence it achieves on heavy-tailed data, MoM has found various applications in machine learning, where it is used to design training procedures that are not sensitive to atypical observations. More recently, a new line of work is now trying to characterize and leverage MoM's ability to deal with corrupted data. In this context, the present work proposes a general study of MoM's concentration properties under the contamination regime, that provides a clear understanding of the impact of the outlier proportion and the number of blocks chosen. The analysis is extended to (multisample) $U$-statistics, i.e. averages over tuples of observations, that raise additional challenges due to the dependence induced. Finally, we show that the latter bounds can be used in a straightforward fashion to derive generalization guarantees for pairwise learning in a contaminated setting, and propose an algorithm to compute provably reliable decision functions.
△ Less
Submitted 7 February, 2021; v1 submitted 9 June, 2020;
originally announced June 2020.
-
The Area of the Convex Hull of Sampled Curves: a Robust Functional Statistical Depth Measure
Authors:
Guillaume Staerman,
Pavlo Mozharovskyi,
Stephan Clémençon
Abstract:
With the ubiquity of sensors in the IoT era, statistical observations are becoming increasingly available in the form of massive (multivariate) time-series. Formulated as unsupervised anomaly detection tasks, an abundance of applications like aviation safety management, the health monitoring of complex infrastructures or fraud detection can now rely on such functional data, acquired and stored wit…
▽ More
With the ubiquity of sensors in the IoT era, statistical observations are becoming increasingly available in the form of massive (multivariate) time-series. Formulated as unsupervised anomaly detection tasks, an abundance of applications like aviation safety management, the health monitoring of complex infrastructures or fraud detection can now rely on such functional data, acquired and stored with an ever finer granularity. The concept of statistical depth, which reflects centrality of an arbitrary observation w.r.t. a statistical population may play a crucial role in this regard, anomalies corresponding to observations with 'small' depth. Supported by sound theoretical and computational developments in the recent decades, it has proven to be extremely useful, in particular in functional spaces. However, most approaches documented in the literature consist in evaluating independently the centrality of each point forming the time series and consequently exhibit a certain insensitivity to possible shape changes. In this paper, we propose a novel notion of functional depth based on the area of the convex hull of sampled curves, capturing gradual departures from centrality, even beyond the envelope of the data, in a natural fashion. We discuss practical relevance of commonly imposed axioms on functional depths and investigate which of them are satisfied by the notion of depth we promote here. Estimation and computational issues are also addressed and various numerical experiments provide empirical evidence of the relevance of the approach proposed.
△ Less
Submitted 13 February, 2020; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Functional Isolation Forest
Authors:
Guillaume Staerman,
Pavlo Mozharovskyi,
Stephan Clémençon,
Florence d'Alché-Buc
Abstract:
For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional n…
▽ More
For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional nature raises many challenging methodological questions. The primary goal of this paper is to extend the popular Isolation Forest (IF) approach to Anomaly Detection, originally dedicated to finite dimensional observations, to functional data. The major difficulty lies in the wide variety of topological structures that may equip a space of functions and the great variety of patterns that may characterize abnormal curves. We address the issue of (randomly) splitting the functional space in a flexible manner in order to isolate progressively any trajectory from the others, a key ingredient to the efficiency of the algorithm. Beyond a detailed description of the algorithm, computational complexity and stability issues are investigated at length. From the scoring function measuring the degree of abnormality of an observation provided by the proposed variant of the IF algorithm, a Functional Statistical Depth function is defined and discussed as well as a multivariate functional extension. Numerical experiments provide strong empirical evidence of the accuracy of the extension proposed.
△ Less
Submitted 9 October, 2019; v1 submitted 9 April, 2019;
originally announced April 2019.