Search | arXiv e-print repository

Private prediction for large-scale synthetic text generation

Authors: Kareem Amin, Alex Bie, Weiwei Kong, Alexey Kurakin, Natalia Ponomareva, Umar Syed, Andreas Terzis, Sergei Vassilvitskii

Abstract: We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the mod… ▽ More We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the model itself is safe to release. We prompt a pretrained LLM with source data, but ensure that next-token predictions are made with differential privacy guarantees. Previous work in this paradigm reported generating a small number of examples (<10) at reasonable privacy levels, an amount of data that is useful only for downstream in-context learning or prompting. In contrast, we make changes that allow us to generate thousands of high-quality synthetic data points, greatly expanding the set of potential applications. Our improvements come from an improved privacy analysis and a better private selection mechanism, which makes use of the equivalence between the softmax layer for sampling tokens in LLMs and the exponential mechanism. Furthermore, we introduce a novel use of public predictions via the sparse vector technique, in which we do not pay privacy costs for tokens that are predictable without sensitive data; we find this to be particularly effective for structured data. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 12 pages main text + 15 pages appendix

arXiv:2406.17814 [pdf, ps, other]

Distribution Learnability and Robustness

Authors: Shai Ben-David, Alex Bie, Gautam Kamath, Tosca Lechner

Abstract: We examine the relationship between learnability and robust (or agnostic) learnability for the problem of distribution learning. We show that, contrary to other learning settings (e.g., PAC learning of function classes), realizable learnability of a class of probability distributions does not imply its agnostic learnability. We go on to examine what type of data corruption can disrupt the learnabi… ▽ More We examine the relationship between learnability and robust (or agnostic) learnability for the problem of distribution learning. We show that, contrary to other learning settings (e.g., PAC learning of function classes), realizable learnability of a class of probability distributions does not imply its agnostic learnability. We go on to examine what type of data corruption can disrupt the learnability of a distribution class and what is such learnability robust against. We show that realizable learnability of a class of distributions implies its robust learnability with respect to only additive corruption, but not against subtractive corruption. We also explore related implications in the context of compression schemes and differentially private learnability. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: In NeurIPS 2023

arXiv:2402.01862 [pdf, other]

Parametric Feature Transfer: One-shot Federated Learning with Foundation Models

Authors: Mahdi Beitollahi, Alex Bie, Sobhan Hemati, Leo Maxime Brunswic, Xu Li, Xi Chen, Guojun Zhang

Abstract: In one-shot federated learning (FL), clients collaboratively train a global model in a single round of communication. Existing approaches for one-shot FL enhance communication efficiency at the expense of diminished accuracy. This paper introduces FedPFT (Federated Learning with Parametric Feature Transfer), a methodology that harnesses the transferability of foundation models to enhance both accu… ▽ More In one-shot federated learning (FL), clients collaboratively train a global model in a single round of communication. Existing approaches for one-shot FL enhance communication efficiency at the expense of diminished accuracy. This paper introduces FedPFT (Federated Learning with Parametric Feature Transfer), a methodology that harnesses the transferability of foundation models to enhance both accuracy and communication efficiency in one-shot FL. The approach involves transferring per-client parametric models (specifically, Gaussian mixtures) of features extracted from foundation models. Subsequently, each parametric model is employed to generate synthetic features for training a classifier head. Experimental results on eight datasets demonstrate that FedPFT enhances the communication-accuracy frontier in both centralized and decentralized FL scenarios, as well as across diverse data-heterogeneity settings such as covariate shift and task shift, with improvements of up to 20.6%. Additionally, FedPFT adheres to the data minimization principle of FL, as clients do not send real features. We demonstrate that sending real features is vulnerable to potent reconstruction attacks. Moreover, we show that FedPFT is amenable to formal privacy guarantees via differential privacy, demonstrating favourable privacy-accuracy tradeoffs. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 20 pages, 12 figures

arXiv:2308.09565 [pdf, other]

Understanding the Role of Layer Normalization in Label-Skewed Federated Learning

Authors: Guojun Zhang, Mahdi Beitollahi, Alex Bie, Xi Chen

Abstract: Layer normalization (LN) is a widely adopted deep learning technique especially in the era of foundation models. Recently, LN has been shown to be surprisingly effective in federated learning (FL) with non-i.i.d. data. However, exactly why and how it works remains mysterious. In this work, we reveal the profound connection between layer normalization and the label shift problem in federated learni… ▽ More Layer normalization (LN) is a widely adopted deep learning technique especially in the era of foundation models. Recently, LN has been shown to be surprisingly effective in federated learning (FL) with non-i.i.d. data. However, exactly why and how it works remains mysterious. In this work, we reveal the profound connection between layer normalization and the label shift problem in federated learning. To understand layer normalization better in FL, we identify the key contributing mechanism of normalization methods in FL, called feature normalization (FN), which applies normalization to the latent feature representation before the classifier head. Although LN and FN do not improve expressive power, they control feature collapse and local overfitting to heavily skewed datasets, and thus accelerates global training. Empirically, we show that normalization leads to drastic improvements on standard benchmarks under extreme label shift. Moreover, we conduct extensive ablation studies to understand the critical factors of layer normalization in FL. Our results verify that FN is an essential ingredient inside LN to significantly improve the convergence of FL while remaining robust to learning rate choices, especially under extreme label shift where each client has access to few classes. Our code is available at \url{https://github.com/huawei-noah/Federated-Learning/tree/main/Layer_Normalization}. △ Less

Submitted 14 February, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: accepted at TMLR

arXiv:2308.06239 [pdf, ps, other]

Private Distribution Learning with Public Data: The View from Sample Compression

Authors: Shai Ben-David, Alex Bie, Clément L. Canonne, Gautam Kamath, Vikrant Singhal

Abstract: We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution $p$ belonging to a class $\mathcal Q$, with the goal of outputting an estimate of $p$ while adhering to privacy constraints (here, pure differential privacy) only with respec… ▽ More We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution $p$ belonging to a class $\mathcal Q$, with the goal of outputting an estimate of $p$ while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class $\mathcal Q$ is connected to the existence of a sample compression scheme for $\mathcal Q$, as well as to an intermediate notion we refer to as list learning. Leveraging this connection: (1) approximately recovers previous results on Gaussians over $\mathbb R^d$; and (2) leads to new ones, including sample complexity upper bounds for arbitrary $k$-mixtures of Gaussians over $\mathbb R^d$, results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in $\mathbb R^d$, at least $d$ public samples are necessary for private learnability, which is close to the known upper bound of $d+1$ public samples. △ Less

Submitted 14 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

Comments: 31 pages

arXiv:2302.02936 [pdf, other]

Private GANs, Revisited

Authors: Alex Bie, Gautam Kamath, Guojun Zhang

Abstract: We show that the canonical approach for training differentially private GANs -- updating the discriminator with differentially private stochastic gradient descent (DPSGD) -- can yield significantly improved results after modifications to training. Specifically, we propose that existing instantiations of this approach neglect to consider how adding noise only to discriminator updates inhibits discr… ▽ More We show that the canonical approach for training differentially private GANs -- updating the discriminator with differentially private stochastic gradient descent (DPSGD) -- can yield significantly improved results after modifications to training. Specifically, we propose that existing instantiations of this approach neglect to consider how adding noise only to discriminator updates inhibits discriminator training, disrupting the balance between the generator and discriminator necessary for successful GAN training. We show that a simple fix -- taking more discriminator steps between generator steps -- restores parity between the generator and discriminator and improves results. Additionally, with the goal of restoring parity, we experiment with other modifications -- namely, large batch sizes and adaptive discriminator update frequency -- to improve discriminator training and see further improvements in generation quality. Our results demonstrate that on standard image synthesis benchmarks, DPSGD outperforms all alternative GAN privatization schemes. Code: https://github.com/alexbie98/dpgan-revisit. △ Less

Submitted 5 October, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: 28 pages; revisions and new experiments from TMLR camera-ready + code release at https://github.com/alexbie98/dpgan-revisit

arXiv:2208.07984 [pdf, other]

Private Estimation with Public Data

Authors: Alex Bie, Gautam Kamath, Vikrant Singhal

Abstract: We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public… ▽ More We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public data samples are sufficient to remove any dependence on the range parameters of the private data distribution from the private sample complexity, which is known to be otherwise necessary without public data. For separated Gaussian mixtures, we assume that the underlying public and private distributions are the same, and we consider two settings: (1) when given a dimension-independent amount of public data, the private sample complexity can be improved polynomially in terms of the number of mixture components, and any dependence on the range parameters of the distribution can be removed in the approximate DP case; (2) when given an amount of public data linear in the dimension, the private sample complexity can be made independent of range parameters even under concentrated DP, and additional improvements can be made to the overall sample complexity. △ Less

Submitted 5 April, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: 55 pages; updated funding acknowledgement + simulation results from NeurIPS 2022 camera-ready

arXiv:2111.01177 [pdf, other]

Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence

Authors: Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis

Abstract: Although machine learning models trained on massive data have led to break-throughs in several areas, their deployment in privacy-sensitive domains remains limited due to restricted access to data. Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead. We propose DP-Sinkhorn, a novel optimal transport-based… ▽ More Although machine learning models trained on massive data have led to break-throughs in several areas, their deployment in privacy-sensitive domains remains limited due to restricted access to data. Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead. We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. DP-Sinkhorn minimizes the Sinkhorn divergence, a computationally efficient approximation to the exact optimal transport distance, between the model and data in a differentially private manner and uses a novel technique for control-ling the bias-variance trade-off of gradient estimates. Unlike existing approaches for training differentially private generative models, which are mostly based on generative adversarial networks, we do not rely on adversarial objectives, which are notoriously difficult to optimize, especially in the presence of noise imposed by privacy constraints. Hence, DP-Sinkhorn is easy to train and deploy. Experimentally, we improve upon the state-of-the-art on multiple image modeling benchmarks and show differentially private synthesis of informative RGB images. Project page:https://nv-tlabs.github.io/DP-Sinkhorn. △ Less

Submitted 29 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Accepted to NeurIPS 2021. 13 pages, 7 pages of supplementary; 6 tables, 8 figures

Journal ref: Advances in Neural Information Processing Systems, Volume 34, pages 12480--12492, year 2021

arXiv:2103.15684 [pdf, other]

doi 10.1007/s10877-022-00822-4

A Model-Based Approach to Synthetic Data Set Generation for Patient-Ventilator Waveforms for Machine Learning and Educational Use

Authors: A. van Diepen, T. H. G. F. Bakkes, A. J. R. De Bie, S. Turco, R. A. Bouwman, P. H. Woerlee, M. Mischi

Abstract: Although mechanical ventilation is a lifesaving intervention in the ICU, it has harmful side-effects, such as barotrauma and volutrauma. These harms can occur due to asynchronies. Asynchronies are defined as a mismatch between the ventilator timing and patient respiratory effort. Automatic detection of these asynchronies, and subsequent feedback, would improve lung ventilation and reduce the proba… ▽ More Although mechanical ventilation is a lifesaving intervention in the ICU, it has harmful side-effects, such as barotrauma and volutrauma. These harms can occur due to asynchronies. Asynchronies are defined as a mismatch between the ventilator timing and patient respiratory effort. Automatic detection of these asynchronies, and subsequent feedback, would improve lung ventilation and reduce the probability of lung damage. Neural networks to detect asynchronies provide a promising new approach but require large annotated data sets, which are difficult to obtain and require complex monitoring of inspiratory effort. In this work, we propose a model-based approach to generate a synthetic data set for machine learning and educational use by extending an existing lung model with a first-order ventilator model. The physiological nature of the derived lung model allows adaptation to various disease archetypes, resulting in a diverse data set. We generated a synthetic data set using 9 different patient archetypes, which are derived from measurements in the literature. The model and synthetic data quality have been verified by comparison with clinical data, review by a clinical expert, and an artificial intelligence model that was trained on experimental data. The evaluation showed it was possible to generate patient-ventilator waveforms including asynchronies that have the most important features of experimental patient-ventilator waveforms. △ Less

Submitted 7 May, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Journal ref: J Clin Monit Comput (2022)

arXiv:1911.03604 [pdf, other]

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

Authors: Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh

Abstract: While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices. That being said, in this paper, we work on simplifying and compressing Transformer-based encoder-decoder architectures for the end-to-end ASR task.… ▽ More While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices. That being said, in this paper, we work on simplifying and compressing Transformer-based encoder-decoder architectures for the end-to-end ASR task. We empirically introduce a more compact Speech-Transformer by investigating the impact of discarding particular modules on the performance of the model. Moreover, we evaluate reducing the numerical precision of our network's weights and activations while maintaining the performance of the full-precision model. Our experiments show that we can reduce the number of parameters of the full-precision model and then further compress the model 4x by fully quantizing to 8-bit fixed point precision. △ Less

Submitted 24 March, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: Submitted to IEEE Signal Processing Letters Minor changes in Section 3

Showing 1–10 of 10 results for author: Bie, A