-
Practical Considerations for Differential Privacy
Authors:
Kareem Amin,
Alex Kulesza,
Sergei Vassilvitskii
Abstract:
Differential privacy is the gold standard for statistical data release. Used by governments, companies, and academics, its mathematically rigorous guarantees and worst-case assumptions on the strength and knowledge of attackers make it a robust and compelling framework for reasoning about privacy. However, even with landmark successes, differential privacy has not achieved widespread adoption in e…
▽ More
Differential privacy is the gold standard for statistical data release. Used by governments, companies, and academics, its mathematically rigorous guarantees and worst-case assumptions on the strength and knowledge of attackers make it a robust and compelling framework for reasoning about privacy. However, even with landmark successes, differential privacy has not achieved widespread adoption in everyday data use and data protection. In this work we examine some of the practical obstacles that stand in the way.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Mean estimation in the add-remove model of differential privacy
Authors:
Alex Kulesza,
Ananda Theertha Suresh,
Yuyan Wang
Abstract:
Differential privacy is often studied under two different models of neighboring datasets: the add-remove model and the swap model. While the swap model is frequently used in the academic literature to simplify analysis, many practical applications rely on the more conservative add-remove model, where obtaining tight results can be difficult. Here, we study the problem of one-dimensional mean estim…
▽ More
Differential privacy is often studied under two different models of neighboring datasets: the add-remove model and the swap model. While the swap model is frequently used in the academic literature to simplify analysis, many practical applications rely on the more conservative add-remove model, where obtaining tight results can be difficult. Here, we study the problem of one-dimensional mean estimation under the add-remove model. We propose a new algorithm and show that it is min-max optimal, achieving the best possible constant in the leading term of the mean squared error for all $ε$, and that this constant is the same as the optimal algorithm under the swap model. These results show that the add-remove and swap models give nearly identical errors for mean estimation, even though the add-remove model cannot treat the size of the dataset as public information. We also demonstrate empirically that our proposed algorithm yields at least a factor of two improvement in mean squared error over algorithms frequently used in practice. One of our main technical contributions is a new hour-glass mechanism, which might be of independent interest in other scenarios.
△ Less
Submitted 19 February, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Subset-Based Instance Optimality in Private Estimation
Authors:
Travis Dick,
Alex Kulesza,
Ziteng Sun,
Ananda Theertha Suresh
Abstract:
We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well w…
▽ More
We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.
△ Less
Submitted 28 May, 2024; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Plume: Differential Privacy at Scale
Authors:
Kareem Amin,
Jennifer Gillenwater,
Matthew Joseph,
Alex Kulesza,
Sergei Vassilvitskii
Abstract:
Differential privacy has become the standard for private data analysis, and an extensive literature now offers differentially private solutions to a wide variety of problems. However, translating these solutions into practical systems often requires confronting details that the literature ignores or abstracts away: users may contribute multiple records, the domain of possible records may be unknow…
▽ More
Differential privacy has become the standard for private data analysis, and an extensive literature now offers differentially private solutions to a wide variety of problems. However, translating these solutions into practical systems often requires confronting details that the literature ignores or abstracts away: users may contribute multiple records, the domain of possible records may be unknown, and the eventual system must scale to large volumes of data. Failure to carefully account for all three issues can severely impair a system's quality and usability.
We present Plume, a system built to address these problems. We describe a number of sometimes subtle implementation issues and offer practical solutions that, together, make an industrial-scale system for differentially private data analysis possible. Plume is currently deployed at Google and is routinely used to process datasets with trillions of records.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
Combining Public and Private Data
Authors:
Cecilia Ferrando,
Jennifer Gillenwater,
Alex Kulesza
Abstract:
Differential privacy is widely adopted to provide provable privacy guarantees in data analysis. We consider the problem of combining public and private data (and, more generally, data with heterogeneous privacy needs) for estimating aggregate statistics. We introduce a mixed estimator of the mean optimized to minimize the variance. We argue that our mechanism is preferable to techniques that prese…
▽ More
Differential privacy is widely adopted to provide provable privacy guarantees in data analysis. We consider the problem of combining public and private data (and, more generally, data with heterogeneous privacy needs) for estimating aggregate statistics. We introduce a mixed estimator of the mean optimized to minimize the variance. We argue that our mechanism is preferable to techniques that preserve the privacy of individuals by subsampling data proportionally to the privacy needs of users. Similarly, we present a mixed median estimator based on the exponential mechanism. We compare our mechanisms to the methods proposed in Jorgensen et al. [2015]. Our experiments provide empirical evidence that our mechanisms often outperform the baseline methods.
△ Less
Submitted 29 October, 2021;
originally announced November 2021.
-
Learning with User-Level Privacy
Authors:
Daniel Levy,
Ziteng Sun,
Kareem Amin,
Satyen Kale,
Alex Kulesza,
Mehryar Mohri,
Ananda Theertha Suresh
Abstract:
We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical…
▽ More
We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples. In contrast, when increasing the number of users $n$, the privacy cost decreases at a faster $O(1/n)$ rate. We complement these results with lower bounds showing the minimax optimality of our algorithms for mean estimation and stochastic convex optimization. Our algorithms rely on novel techniques for private mean estimation in arbitrary dimension with error scaling as the concentration radius $τ$ of the distribution rather than the entire range.
△ Less
Submitted 3 December, 2021; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Differentially Private Quantiles
Authors:
Jennifer Gillenwater,
Matthew Joseph,
Alex Kulesza
Abstract:
Quantiles are often used for summarizing and understanding data. If that data is sensitive, it may be necessary to compute quantiles in a way that is differentially private, providing theoretical guarantees that the result does not reveal private information. However, when multiple quantiles are needed, existing differentially private algorithms fare poorly: they either compute quantiles individua…
▽ More
Quantiles are often used for summarizing and understanding data. If that data is sensitive, it may be necessary to compute quantiles in a way that is differentially private, providing theoretical guarantees that the result does not reveal private information. However, when multiple quantiles are needed, existing differentially private algorithms fare poorly: they either compute quantiles individually, splitting the privacy budget, or summarize the entire distribution, wasting effort. In either case the result is reduced accuracy. In this work we propose an instance of the exponential mechanism that simultaneously estimates exactly $m$ quantiles from $n$ data points while guaranteeing differential privacy. The utility function is carefully structured to allow for an efficient implementation that returns estimates of all $m$ quantiles in time $O(mn\log(n) + m^2n)$. Experiments show that our method significantly outperforms the current state of the art on both real and synthetic data while remaining efficient enough to be practical.
△ Less
Submitted 20 September, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Information Extraction from Larger Multi-layer Social Networks
Authors:
Brandon Oselio,
Alex Kulesza,
Alfred Hero
Abstract:
Social networks often encode community structure using multiple distinct types of links between nodes. In this paper we introduce a novel method to extract information from such multi-layer networks, where each type of link forms its own layer. Using the concept of Pareto optimality, community detection in this multi-layer setting is formulated as a multiple criterion optimization problem. We prop…
▽ More
Social networks often encode community structure using multiple distinct types of links between nodes. In this paper we introduce a novel method to extract information from such multi-layer networks, where each type of link forms its own layer. Using the concept of Pareto optimality, community detection in this multi-layer setting is formulated as a multiple criterion optimization problem. We propose an algorithm for finding an approximate Pareto frontier containing a family of solutions. The power of this approach is demonstrated on a Twitter dataset, where the nodes are hashtags and the layers correspond to (1) behavioral edges connecting pairs of hashtags whose temporal profiles are similar and (2) relational edges connecting pairs of hashtags that appear in the same tweets.
△ Less
Submitted 30 June, 2015;
originally announced July 2015.
-
Socio-Spatial Pareto Frontiers of Twitter Networks
Authors:
Brandon Oselio,
Alex Kulesza,
Alfred Hero
Abstract:
Social media provides a rich source of networked data. This data is represented by a set of nodes and a set of relations (edges). It is often possible to obtain or infer multiple types of relations from the same set of nodes, such as observed friend connections, inferred links via semantic comparison, or relations based off of geographic proximity. These edge sets can be represented by one multi-l…
▽ More
Social media provides a rich source of networked data. This data is represented by a set of nodes and a set of relations (edges). It is often possible to obtain or infer multiple types of relations from the same set of nodes, such as observed friend connections, inferred links via semantic comparison, or relations based off of geographic proximity. These edge sets can be represented by one multi-layer network. In this paper we review a method to perform community detection of multilayer networks, and illustrate its use as a visualization tool for analyzing different community partitions. The algorithm is illustrated on a dataset from Twitter, specifically regarding the National Football League (NFL).
△ Less
Submitted 29 June, 2015;
originally announced June 2015.
-
Diversifying Sparsity Using Variational Determinantal Point Processes
Authors:
Nematollah Kayhan Batmanghelich,
Gerald Quon,
Alex Kulesza,
Manolis Kellis,
Polina Golland,
Luke Bornn
Abstract:
We propose a novel diverse feature selection method based on determinantal point processes (DPPs). Our model enables one to flexibly define diversity based on the covariance of features (similar to orthogonal matching pursuit) or alternatively based on side information. We introduce our approach in the context of Bayesian sparse regression, employing a DPP as a variational approximation to the tru…
▽ More
We propose a novel diverse feature selection method based on determinantal point processes (DPPs). Our model enables one to flexibly define diversity based on the covariance of features (similar to orthogonal matching pursuit) or alternatively based on side information. We introduce our approach in the context of Bayesian sparse regression, employing a DPP as a variational approximation to the true spike and slab posterior distribution. We subsequently show how this variational DPP approximation generalizes and extends mean-field approximation, and can be learned efficiently by exploiting the fast sampling properties of DPPs. Our motivating application comes from bioinformatics, where we aim to identify a diverse set of genes whose expression profiles predict a tumor type where the diversity is defined with respect to a gene-gene interaction network. We also explore an application in spatial statistics. In both cases, we demonstrate that the proposed method yields significantly more diverse feature sets than classic sparse methods, without compromising accuracy.
△ Less
Submitted 23 November, 2014;
originally announced November 2014.
-
Expectation-Maximization for Learning Determinantal Point Processes
Authors:
Jennifer Gillenwater,
Alex Kulesza,
Emily Fox,
Ben Taskar
Abstract:
A determinantal point process (DPP) is a probabilistic model of set diversity compactly parameterized by a positive semi-definite kernel matrix. To fit a DPP to a given task, we would like to learn the entries of its kernel matrix by maximizing the log-likelihood of the available data. However, log-likelihood is non-convex in the entries of the kernel matrix, and this learning problem is conjectur…
▽ More
A determinantal point process (DPP) is a probabilistic model of set diversity compactly parameterized by a positive semi-definite kernel matrix. To fit a DPP to a given task, we would like to learn the entries of its kernel matrix by maximizing the log-likelihood of the available data. However, log-likelihood is non-convex in the entries of the kernel matrix, and this learning problem is conjectured to be NP-hard. Thus, previous work has instead focused on more restricted convex learning settings: learning only a single weight for each row of the kernel matrix, or learning weights for a linear combination of DPPs with fixed kernel matrices. In this work we propose a novel algorithm for learning the full kernel matrix. By changing the kernel parameterization from matrix entries to eigenvalues and eigenvectors, and then lower-bounding the likelihood in the manner of expectation-maximization algorithms, we obtain an effective optimization procedure. We test our method on a real-world product recommendation task, and achieve relative gains of up to 16.5% in test log-likelihood compared to the naive approach of maximizing likelihood by projected gradient ascent on the entries of the kernel matrix.
△ Less
Submitted 4 November, 2014;
originally announced November 2014.
-
Social Collaborative Retrieval
Authors:
Ko-Jen Hsiao,
Alex Kulesza,
Alfred Hero
Abstract:
Socially-based recommendation systems have recently attracted significant interest, and a number of studies have shown that social information can dramatically improve a system's predictions of user interests. Meanwhile, there are now many potential applications that involve aspects of both recommendation and information retrieval, and the task of collaborative retrieval---a combination of these t…
▽ More
Socially-based recommendation systems have recently attracted significant interest, and a number of studies have shown that social information can dramatically improve a system's predictions of user interests. Meanwhile, there are now many potential applications that involve aspects of both recommendation and information retrieval, and the task of collaborative retrieval---a combination of these two traditional problems---has recently been introduced. Successful collaborative retrieval requires overcoming severe data sparsity, making additional sources of information, such as social graphs, particularly valuable. In this paper we propose a new model for collaborative retrieval, and show that our algorithm outperforms current state-of-the-art approaches by incorporating information from social networks. We also provide empirical analyses of the ways in which cultural interests propagate along a social graph using a real-world music dataset.
△ Less
Submitted 8 April, 2014;
originally announced April 2014.
-
Multi-layer graph analysis for dynamic social networks
Authors:
Brandon Oselio,
Alex Kulesza,
Alfred O. Hero III
Abstract:
Modern social networks frequently encompass multiple distinct types of connectivity information; for instance, explicitly acknowledged friend relationships might complement behavioral measures that link users according to their actions or interests. One way to represent these networks is as multi-layer graphs, where each layer contains a unique set of edges over the same underlying vertices (users…
▽ More
Modern social networks frequently encompass multiple distinct types of connectivity information; for instance, explicitly acknowledged friend relationships might complement behavioral measures that link users according to their actions or interests. One way to represent these networks is as multi-layer graphs, where each layer contains a unique set of edges over the same underlying vertices (users). Edges in different layers typically have related but distinct semantics; depending on the application multiple layers might be used to reduce noise through averaging, to perform multifaceted analyses, or a combination of the two. However, it is not obvious how to extend standard graph analysis techniques to the multi-layer setting in a flexible way. In this paper we develop latent variable models and methods for mining multi-layer networks for connectivity patterns based on noisy data.
△ Less
Submitted 11 May, 2014; v1 submitted 19 September, 2013;
originally announced September 2013.
-
Markov Determinantal Point Processes
Authors:
Raja Hafiz Affandi,
Alex Kulesza,
Emily B. Fox
Abstract:
A determinantal point process (DPP) is a random process useful for modeling the combinatorial problem of subset selection. In particular, DPPs encourage a random subset Y to contain a diverse set of items selected from a base set Y. For example, we might use a DPP to display a set of news headlines that are relevant to a user's interests while covering a variety of topics. Suppose, however, that w…
▽ More
A determinantal point process (DPP) is a random process useful for modeling the combinatorial problem of subset selection. In particular, DPPs encourage a random subset Y to contain a diverse set of items selected from a base set Y. For example, we might use a DPP to display a set of news headlines that are relevant to a user's interests while covering a variety of topics. Suppose, however, that we are asked to sequentially select multiple diverse sets of items, for example, displaying new headlines day-by-day. We might want these sets to be diverse not just individually but also through time, offering headlines today that are unlike the ones shown yesterday. In this paper, we construct a Markov DPP (M-DPP) that models a sequence of random sets {Yt}. The proposed M-DPP defines a stationary process that maintains DPP margins. Crucially, the induced union process Zt = Yt u Yt-1 is also marginally DPP-distributed. Jointly, these properties imply that the sequence of random sets are encouraged to be diverse both at a given time step as well as across time steps. We describe an exact, efficient sampling procedure, and a method for incrementally learning a quality measure over items in the base set Y based on external preferences. We apply the M-DPP to the task of sequentially displaying diverse and relevant news articles to a user with topic preferences.
△ Less
Submitted 16 October, 2012;
originally announced October 2012.
-
Determinantal point processes for machine learning
Authors:
Alex Kulesza,
Ben Taskar
Abstract:
Determinantal point processes (DPPs) are elegant probabilistic models of repulsion that arise in quantum physics and random matrix theory. In contrast to traditional structured models like Markov random fields, which become intractable and hard to approximate in the presence of negative correlations, DPPs offer efficient and exact algorithms for sampling, marginalization, conditioning, and other i…
▽ More
Determinantal point processes (DPPs) are elegant probabilistic models of repulsion that arise in quantum physics and random matrix theory. In contrast to traditional structured models like Markov random fields, which become intractable and hard to approximate in the presence of negative correlations, DPPs offer efficient and exact algorithms for sampling, marginalization, conditioning, and other inference tasks. We provide a gentle introduction to DPPs, focusing on the intuitions, algorithms, and extensions that are most relevant to the machine learning community, and show how DPPs can be applied to real-world applications like finding diverse sets of high-quality search results, building informative summaries by selecting diverse sentences from documents, modeling non-overlapping human poses in images or video, and automatically building timelines of important news stories.
△ Less
Submitted 10 January, 2013; v1 submitted 25 July, 2012;
originally announced July 2012.
-
Learning Determinantal Point Processes
Authors:
Alex Kulesza,
Ben Taskar
Abstract:
Determinantal point processes (DPPs), which arise in random matrix theory and quantum physics, are natural models for subset selection problems where diversity is preferred. Among many remarkable properties, DPPs offer tractable algorithms for exact inference, including computing marginal probabilities and sampling; however, an important open question has been how to learn a DPP from labeled train…
▽ More
Determinantal point processes (DPPs), which arise in random matrix theory and quantum physics, are natural models for subset selection problems where diversity is preferred. Among many remarkable properties, DPPs offer tractable algorithms for exact inference, including computing marginal probabilities and sampling; however, an important open question has been how to learn a DPP from labeled training data. In this paper we propose a natural feature-based parameterization of conditional DPPs, and show how it leads to a convex and efficient learning formulation. We analyze the relationship between our model and binary Markov random fields with repulsive potentials, which are qualitatively similar but computationally intractable. Finally, we apply our approach to the task of extractive summarization, where the goal is to choose a small subset of sentences conveying the most important information from a set of documents. In this task there is a fundamental tradeoff between sentences that are highly relevant to the collection as a whole, and sentences that are diverse and not repetitive. Our parameterization allows us to naturally balance these two characteristics. We evaluate our system on data from the DUC 2003/04 multi-document summarization task, achieving state-of-the-art results.
△ Less
Submitted 14 February, 2012;
originally announced February 2012.