-
Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement
Authors:
Lucas Potin,
Rosa Figueiredo,
Vincent Labatut,
Christine Largeron
Abstract:
In the context of public procurement, several indicators called red flags are used to estimate fraud risk. They are computed according to certain contract attributes and are therefore dependent on the proper filling of the contract and award notices. However, these attributes are very often missing in practice, which prohibits red flags computation. Traditional fraud detection approaches focus on…
▽ More
In the context of public procurement, several indicators called red flags are used to estimate fraud risk. They are computed according to certain contract attributes and are therefore dependent on the proper filling of the contract and award notices. However, these attributes are very often missing in practice, which prohibits red flags computation. Traditional fraud detection approaches focus on tabular data only, considering each contract separately, and are therefore very sensitive to this issue. In this work, we adopt a graph-based method allowing leveraging relations between contracts, to compensate for the missing attributes. We propose PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs. Notably, it is able to identify induced subgraphs, a type of pattern widely overlooked in the literature. When benchmarked on standard datasets, its predictive performance is on par with state-of-the-art methods, with the additional advantage of being explainable. These experiments also reveal that induced patterns are more discriminative on certain datasets. When applying PANG to public procurement data, the prediction is superior to other methods, and it identifies subgraph patterns that are characteristic of fraud-prone situations, thereby making it possible to better understand fraudulent behavior.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
FOPPA: An Open Database of French Public Procurement Award Notices From 2010--2020
Authors:
Lucas Potin,
Vincent Labatut,
Pierre-Henri Morand,
Christine Largeron
Abstract:
Public Procurement refers to governments' purchasing activities of goods, services, and construction of public works. In the European Union (EU), it is an essential sector, corresponding to 15% of the GDP. EU public procurement generates large amounts of data, because award notices related to contracts exceeding a predefined threshold must be published on the TED (EU's official journal). Under the…
▽ More
Public Procurement refers to governments' purchasing activities of goods, services, and construction of public works. In the European Union (EU), it is an essential sector, corresponding to 15% of the GDP. EU public procurement generates large amounts of data, because award notices related to contracts exceeding a predefined threshold must be published on the TED (EU's official journal). Under the framework of the DeCoMaP project, which aims at leveraging such data in order to predict fraud in public procurement, we constitute the FOPPA (French Open Public Procurement Award notices) database. It contains the description of 1,380,965 lots obtained from the TED, covering the 2010--2020 period for France. We detect a number of substantial issues in these data, and propose a set of automated and semi-automated methods to solve them and produce a usable database. It can be leveraged to study public procurement in an academic setting, but also to facilitate the monitoring of public policies, and to improve the quality of the data offered to buyers and suppliers.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
A Survey on Fairness for Machine Learning on Graphs
Authors:
Charlotte Laclau,
Christine Largeron,
Manvi Choudhary
Abstract:
Nowadays, the analysis of complex phenomena modeled by graphs plays a crucial role in many real-world application domains where decisions can have a strong societal impact. However, numerous studies and papers have recently revealed that machine learning models could lead to potential disparate treatment between individuals and unfair outcomes. In that context, algorithmic contributions for graph…
▽ More
Nowadays, the analysis of complex phenomena modeled by graphs plays a crucial role in many real-world application domains where decisions can have a strong societal impact. However, numerous studies and papers have recently revealed that machine learning models could lead to potential disparate treatment between individuals and unfair outcomes. In that context, algorithmic contributions for graph mining are not spared by the problem of fairness and present some specific challenges related to the intrinsic nature of graphs: (1) graph data is non-IID, and this assumption may invalidate many existing studies in fair machine learning, (2) suited metric definitions to assess the different types of fairness with relational data and (3) algorithmic challenge on the difficulty of finding a good trade-off between model accuracy and fairness. This survey is the first one dedicated to fairness for relational data. It aims to present a comprehensive review of state-of-the-art techniques in fairness on graph mining and identify the open challenges and future trends. In particular, we start by presenting several sensible application domains and the associated graph mining tasks with a focus on edge prediction and node classification in the sequel. We also recall the different metrics proposed to evaluate potential bias at different levels of the graph mining process; then we provide a comprehensive overview of recent contributions in the domain of fair machine learning for graphs, that we classify into pre-processing, in-processing and post-processing models. We also propose to describe existing graph data, synthetic and real-world benchmarks. Finally, we present in detail five potential promising directions to advance research in studying algorithmic fairness on graphs.
△ Less
Submitted 21 February, 2024; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Linking the Dynamics of User Stance to the Structure of Online Discussions
Authors:
Christine Largeron,
Andrei Mardale,
Marian-Andrei Rizoiu
Abstract:
This paper studies the dynamics of opinion formation and polarization in social media. We investigate whether users' stance concerning contentious subjects is influenced by the online discussions they are exposed to and interactions with users supporting different stances. We set up a series of predictive exercises based on machine learning models. Users are described using several posting activit…
▽ More
This paper studies the dynamics of opinion formation and polarization in social media. We investigate whether users' stance concerning contentious subjects is influenced by the online discussions they are exposed to and interactions with users supporting different stances. We set up a series of predictive exercises based on machine learning models. Users are described using several posting activities features capturing their overall activity levels, posting success, the reactions their posts attract from users of different stances, and the types of discussions in which they engage. Given the user description at present, the purpose is to predict their stance in the future. Using a dataset of Brexit discussions on the Reddit platform, we show that the activity features regularly outperform the textual baseline, confirming the link between exposure to discussion and opinion. We find that the most informative features relate to the stance composition of the discussion in which users prefer to engage.
△ Less
Submitted 27 February, 2021; v1 submitted 24 January, 2021;
originally announced January 2021.
-
All of the Fairness for Edge Prediction with Optimal Transport
Authors:
Charlotte Laclau,
Ievgen Redko,
Manvi Choudhary,
Christine Largeron
Abstract:
Machine learning and data mining algorithms have been increasingly used recently to support decision-making systems in many areas of high societal importance such as healthcare, education, or security. While being very efficient in their predictive abilities, the deployed algorithms sometimes tend to learn an inductive model with a discriminative bias due to the presence of this latter in the lear…
▽ More
Machine learning and data mining algorithms have been increasingly used recently to support decision-making systems in many areas of high societal importance such as healthcare, education, or security. While being very efficient in their predictive abilities, the deployed algorithms sometimes tend to learn an inductive model with a discriminative bias due to the presence of this latter in the learning sample. This problem gave rise to a new field of algorithmic fairness where the goal is to correct the discriminative bias introduced by a certain attribute in order to decorrelate it from the model's output. In this paper, we study the problem of fairness for the task of edge prediction in graphs, a largely underinvestigated scenario compared to a more popular setting of fair classification. To this end, we formulate the problem of fair edge prediction, analyze it theoretically, and propose an embedding-agnostic repairing procedure for the adjacency matrix of an arbitrary graph with a trade-off between the group and individual fairness. We experimentally show the versatility of our approach and its capacity to provide explicit control over different notions of fairness and prediction accuracy.
△ Less
Submitted 30 October, 2020;
originally announced October 2020.
-
A Model for Managing Collections of Patterns
Authors:
Baptiste Jeudy,
Christine Largeron,
François Jacquenet
Abstract:
Data mining algorithms are now able to efficiently deal with huge amount of data. Various kinds of patterns may be discovered and may have some great impact on the general development of knowledge. In many domains, end users may want to have their data mined by data mining tools in order to extract patterns that could impact their business. Nevertheless, those users are often overwhelmed by the…
▽ More
Data mining algorithms are now able to efficiently deal with huge amount of data. Various kinds of patterns may be discovered and may have some great impact on the general development of knowledge. In many domains, end users may want to have their data mined by data mining tools in order to extract patterns that could impact their business. Nevertheless, those users are often overwhelmed by the large quantity of patterns extracted in such a situation. Moreover, some privacy issues, or some commercial one may lead the users not to be able to mine the data by themselves. Thus, the users may not have the possibility to perform many experiments integrating various constraints in order to focus on specific patterns they would like to extract. Post processing of patterns may be an answer to that drawback. Thus, in this paper we present a framework that could allow end users to manage collections of patterns. We propose to use an efficient data structure on which some algebraic operators may be used in order to retrieve or access patterns in pattern bases.
△ Less
Submitted 6 February, 2009;
originally announced February 2009.