Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: De Cock, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.08614  [pdf, other

    cs.CR

    CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources

    Authors: Sikha Pentyala, Mayana Pereira, Martine De Cock

    Abstract: Data is the lifeblood of the modern world, forming a fundamental part of AI, decision-making, and research advances. With increase in interest in data, governments have taken important steps towards a regulated data world, drastically impacting data sharing and data usability and resulting in massive amounts of data confined within the walls of organizations. While synthetic data generation (SDG)… ▽ More

    Submitted 8 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: In Proceedings of the 41st International Conference on Machine Learning, 2024

  2. arXiv:2402.06699  [pdf, other

    cs.CR

    High Epsilon Synthetic Data Vulnerabilities in MST and PrivBayes

    Authors: Steven Golob, Sikha Pentyala, Anuar Maratkhan, Martine De Cock

    Abstract: Synthetic data generation (SDG) has become increasingly popular as a privacy-enhancing technology. It aims to maintain important statistical properties of its underlying training data, while excluding any personally identifiable information. There have been a whole host of SDG algorithms developed in recent years to improve and balance both of these aims. Many of these algorithms provide robust di… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  3. arXiv:2303.02916  [pdf, other

    cs.IR cs.CR cs.CY

    Privacy-Preserving Fair Item Ranking

    Authors: Jia Ao Sun, Sikha Pentyala, Martine De Cock, Golnoosh Farnadi

    Abstract: Users worldwide access massive amounts of curated data in the form of rankings on a daily basis. The societal impact of this ease of access has been studied and work has been done to propose and enforce various notions of fairness in rankings. Current computational methods for fair item ranking rely on disclosing user data to a centralized server, which gives rise to privacy concerns for the users… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  4. arXiv:2210.07332  [pdf, other

    cs.CR cs.LG

    Secure Multiparty Computation for Synthetic Data Generation from Distributed Data

    Authors: Mayana Pereira, Sikha Pentyala, Anderson Nascimento, Rafael T. de Sousa Jr., Martine De Cock

    Abstract: Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education. Synthetic data generation algorithms with privacy guarantees are emerging as a paradigm to break this data logjam. Existing approaches, however, assume that the data holders supply their raw data to a trusted curator, who uses it as fuel for synthetic… ▽ More

    Submitted 28 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

  5. arXiv:2205.11584  [pdf, other

    cs.LG cs.CR

    PrivFairFL: Privacy-Preserving Group Fairness in Federated Learning

    Authors: Sikha Pentyala, Nicola Neophytou, Anderson Nascimento, Martine De Cock, Golnoosh Farnadi

    Abstract: Group fairness ensures that the outcome of machine learning (ML) based decision making systems are not biased towards a certain group of people defined by a sensitive attribute such as gender or ethnicity. Achieving group fairness in Federated Learning (FL) is challenging because mitigating bias inherently requires using the sensitive attribute values of all clients, while FL is aimed precisely at… ▽ More

    Submitted 26 August, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

  6. arXiv:2202.04058  [pdf, other

    cs.LG cs.CR

    PrivFair: a Library for Privacy-Preserving Fairness Auditing

    Authors: Sikha Pentyala, David Melanson, Martine De Cock, Golnoosh Farnadi

    Abstract: Machine learning (ML) has become prominent in applications that directly affect people's quality of life, including in healthcare, justice, and finance. ML models have been found to exhibit discrimination based on sensitive attributes such as gender, race, or disability. Assessing if an ML model is free of bias remains challenging to date, and by definition has to be done with sensitive user chara… ▽ More

    Submitted 23 May, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

  7. arXiv:2202.02625  [pdf, other

    cs.CR cs.LG

    Training Differentially Private Models with Secure Multiparty Computation

    Authors: Sikha Pentyala, Davis Railsback, Ricardo Maia, Rafael Dowsley, David Melanson, Anderson Nascimento, Martine De Cock

    Abstract: We address the problem of learning a machine learning model from training data that originates at multiple data owners while providing formal privacy guarantees regarding the protection of each owner's data. Existing solutions based on Differential Privacy (DP) achieve this at the cost of a drop in accuracy. Solutions based on Secure Multiparty Computation (MPC) do not incur such accuracy loss but… ▽ More

    Submitted 1 September, 2022; v1 submitted 5 February, 2022; originally announced February 2022.

  8. arXiv:2106.02769  [pdf, other

    cs.CR cs.LG

    Privacy-Preserving Training of Tree Ensembles over Continuous Data

    Authors: Samuel Adams, Chaitali Choudhary, Martine De Cock, Rafael Dowsley, David Melanson, Anderson C. A. Nascimento, Davis Railsback, Jianwei Shen

    Abstract: Most existing Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the features are categorical. In real-life applications, features are often numerical. The standard ``in the clear'' algorithm to grow decision trees on data with continuous values requires sorting of training examples for each feature in the quest for an… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  9. arXiv:2102.03517  [pdf, other

    cs.CR cs.LG

    Privacy-Preserving Feature Selection with Secure Multiparty Computation

    Authors: Xiling Li, Rafael Dowsley, Martine De Cock

    Abstract: Existing work on privacy-preserving machine learning with Secure Multiparty Computation (MPC) is almost exclusively focused on model training and on inference with trained models, thereby overlooking the important data pre-processing stage. In this work, we propose the first MPC based protocol for private feature selection based on the filter method, which is independent of model training, and can… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

  10. arXiv:2102.03513  [pdf, other

    cs.CR cs.CV cs.LG

    Privacy-Preserving Video Classification with Convolutional Neural Networks

    Authors: Sikha Pentyala, Rafael Dowsley, Martine De Cock

    Abstract: Many video classification applications require access to personal data, thereby posing an invasive security risk to the users' privacy. We propose a privacy-preserving implementation of single-frame method based video classification with convolutional neural networks that allows a party to infer a label from a video without necessitating the video owner to disclose their video to other entities in… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

  11. arXiv:2007.00253  [pdf, other

    cs.CR cs.LG cs.SD eess.AS

    Private Speech Classification with Secure Multiparty Computation

    Authors: Kyle Bittner, Martine De Cock, Rafael Dowsley

    Abstract: Deep learning in audio signal processing, such as human voice audio signal classification, is a rich application area of machine learning. Legitimate use cases include voice authentication, gunfire detection, and emotion recognition. While there are clear advantages to automated human speech classification, application developers can gain knowledge beyond the professed scope from unprotected audio… ▽ More

    Submitted 28 January, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

  12. arXiv:2003.05703  [pdf, other

    cs.CR stat.ML

    Inline Detection of DGA Domains Using Side Information

    Authors: Raaghavi Sivaguru, Jonathan Peck, Femi Olumofin, Anderson Nascimento, Martine De Cock

    Abstract: Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There ar… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

  13. arXiv:2002.05377  [pdf, other

    cs.CR cs.LG

    High Performance Logistic Regression for Privacy-Preserving Genome Analysis

    Authors: Martine De Cock, Rafael Dowsley, Anderson C. A. Nascimento, Davis Railsback, Jianwei Shen, Ariel Todoki

    Abstract: In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure Multi-Party Computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

    Submitted 3 March, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

  14. arXiv:2001.01177  [pdf, other

    cs.SI cs.LG stat.ML

    User Profiling Using Hinge-loss Markov Random Fields

    Authors: Golnoosh Farnadi, Lise Getoor, Marie-Francine Moens, Martine De Cock

    Abstract: A variety of approaches have been proposed to automatically infer the profiles of users from their digital footprint in social media. Most of the proposed approaches focus on mining a single type of information, while ignoring other sources of available user-generated content (UGC). In this paper, we propose a mechanism to infer a variety of user characteristics, such as, age, gender and personali… ▽ More

    Submitted 5 January, 2020; originally announced January 2020.

  15. arXiv:1907.01586  [pdf, other

    cs.CR cs.LG

    Protecting Privacy of Users in Brain-Computer Interface Applications

    Authors: Anisha Agarwal, Rafael Dowsley, Nicholas D. McKinney, Dongrui Wu, Chin-Teng Lin, Martine De Cock, Anderson C. A. Nascimento

    Abstract: Machine learning (ML) is revolutionizing research and industry. Many ML applications rely on the use of large amounts of personal data for training and inference. Among the most intimate exploited data sources is electroencephalogram (EEG) data, a kind of data that is so rich with information that application developers can easily gain knowledge beyond the professed scope from unprotected EEG sign… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

  16. arXiv:1906.02325  [pdf, other

    cs.CR cs.IR cs.LG

    Privacy-Preserving Classification of Personal Text Messages with Secure Multi-Party Computation: An Application to Hate-Speech Detection

    Authors: Devin Reich, Ariel Todoki, Rafael Dowsley, Martine De Cock, Anderson C. A. Nascimento

    Abstract: Classification of personal text messages has many useful applications in surveillance, e-commerce, and mental health care, to name a few. Giving applications access to personal texts can easily lead to (un)intentional privacy violations. We propose the first privacy-preserving solution for text classification that is provably secure. Our method, which is based on Secure Multiparty Computation (SMC… ▽ More

    Submitted 12 March, 2021; v1 submitted 5 June, 2019; originally announced June 2019.

  17. arXiv:1905.01078  [pdf, other

    cs.LG cs.CR stat.ML

    CharBot: A Simple and Effective Method for Evading DGA Classifiers

    Authors: Jonathan Peck, Claire Nie, Raaghavi Sivaguru, Charles Grumer, Femi Olumofin, Bin Yu, Anderson Nascimento, Martine De Cock

    Abstract: Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered d… ▽ More

    Submitted 30 May, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

  18. arXiv:1808.10151  [pdf, other

    cs.CR cs.LG

    VirtualIdentity: Privacy-Preserving User Profiling

    Authors: Sisi Wang, Wing-Sea Poon, Golnoosh Farnadi, Caleb Horst, Kebra Thompson, Michael Nickels, Rafael Dowsley, Anderson C. A. Nascimento, Martine De Cock

    Abstract: User profiling from user generated content (UGC) is a common practice that supports the business models of many social media companies. Existing systems require that the UGC is fully exposed to the module that constructs the user profiles. In this paper we show that it is possible to build user profiles without ever accessing the user's original data, and without exposing the trained machine learn… ▽ More

    Submitted 30 August, 2018; originally announced August 2018.

  19. Solving stable matching problems using answer set programming

    Authors: Sofie De Clercq, Steven Schockaert, Martine De Cock, Ann Nowé

    Abstract: Since the introduction of the stable marriage problem (SMP) by Gale and Shapley (1962), several variants and extensions have been investigated. While this variety is useful to widen the application potential, each variant requires a new algorithm for finding the stable matchings. To address this issue, we propose an encoding of the SMP using answer set programming (ASP), which can straightforwardl… ▽ More

    Submitted 16 December, 2015; originally announced December 2015.

    Comments: Under consideration in Theory and Practice of Logic Programming (TPLP). arXiv admin note: substantial text overlap with arXiv:1302.7251

    Journal ref: Theory and Practice of Logic Programming 16 (2016) 247-268

  20. Characterizing and Extending Answer Set Semantics using Possibility Theory

    Authors: Kim Bauters, Steven Schockaert, Martine De Cock, Dirk Vermeir

    Abstract: Answer Set Programming (ASP) is a popular framework for modeling combinatorial problems. However, ASP cannot easily be used for reasoning about uncertain information. Possibilistic ASP (PASP) is an extension of ASP that combines possibilistic logic and ASP. In PASP a weight is associated with each rule, where this weight is interpreted as the certainty with which the conclusion can be established… ▽ More

    Submitted 30 November, 2013; originally announced December 2013.

    Comments: 39 pages and 16 pages appendix with proofs. This article has been accepted for publication in Theory and Practice of Logic Programming, Copyright Cambridge University Press

    ACM Class: D.1.6; F.1.3

    Journal ref: Theory and Practice of Logic Programming 15 (2015) 79-116

  21. arXiv:1302.7251  [pdf, ps, other

    cs.AI cs.LO

    Modeling Stable Matching Problems with Answer Set Programming

    Authors: Sofie De Clercq, Steven Schockaert, Martine De Cock, Ann Nowé

    Abstract: The Stable Marriage Problem (SMP) is a well-known matching problem first introduced and solved by Gale and Shapley (1962). Several variants and extensions to this problem have since been investigated to cover a wider set of applications. Each time a new variant is considered, however, a new algorithm needs to be developed and implemented. As an alternative, in this paper we propose an encoding of… ▽ More

    Submitted 2 May, 2013; v1 submitted 28 February, 2013; originally announced February 2013.

    Comments: 26 pages

    MSC Class: 68N17

  22. arXiv:1203.3466  [pdf

    cs.AI

    Possibilistic Answer Set Programming Revisited

    Authors: Kim Bauters, Steven Schockaert, Martine De Cock, Dirk Vermeir

    Abstract: Possibilistic answer set programming (PASP) extends answer set programming (ASP) by attaching to each rule a degree of certainty. While such an extension is important from an application point of view, existing semantics are not well-motivated, and do not always yield intuitive results. To develop a more suitable semantics, we first introduce a characterization of answer sets of classical ASP prog… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-48-55

  23. arXiv:1109.2434  [pdf, other

    cs.LO cs.PL

    Expressiveness of Communication in Answer Set Programming

    Authors: Kim Bauters, Jeroen Janssen, Steven Schockaert, Dirk Vermeir, Martine De Cock

    Abstract: Answer set programming (ASP) is a form of declarative programming that allows to succinctly formulate and efficiently solve complex problems. An intuitive extension of this formalism is communicating ASP, in which multiple ASP programs collaborate to solve the problem at hand. However, the expressiveness of communicating ASP has not been thoroughly studied. In this paper, we present a systematic s… ▽ More

    Submitted 12 September, 2011; originally announced September 2011.

    Comments: 35 pages. This article has been accepted for publication in Theory and Practice of Logic Programming, Copyright Cambridge University Press

    ACM Class: D.1.6; F.1.3

  24. arXiv:1104.5133  [pdf, other

    cs.PL cs.LO

    Reducing Fuzzy Answer Set Programming to Model Finding in Fuzzy Logics

    Authors: Jeroen Janssen, Steven Schockaert, Dirk Vermeir, Martine De Cock

    Abstract: In recent years answer set programming has been extended to deal with multi-valued predicates. The resulting formalisms allows for the modeling of continuous problems as elegantly as ASP allows for the modeling of discrete problems, by combining the stable model semantics underlying ASP with fuzzy logics. However, contrary to the case of classical ASP where many efficient solvers have been constru… ▽ More

    Submitted 27 April, 2011; originally announced April 2011.

    MSC Class: 68N17