Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Terzis, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12108  [pdf, other

    cs.LG cs.CL cs.CR

    Private prediction for large-scale synthetic text generation

    Authors: Kareem Amin, Alex Bie, Weiwei Kong, Alexey Kurakin, Natalia Ponomareva, Umar Syed, Andreas Terzis, Sergei Vassilvitskii

    Abstract: We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the mod… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 12 pages main text + 15 pages appendix

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2306.01684  [pdf, other

    cs.LG cs.CR

    Harnessing large-language models to generate private synthetic text

    Authors: Alexey Kurakin, Natalia Ponomareva, Umar Syed, Liam MacDermed, Andreas Terzis

    Abstract: Differentially private training algorithms like DP-SGD protect sensitive training data by ensuring that trained models do not reveal private information. An alternative approach, which this paper studies, is to use a sensitive dataset to generate synthetic data that is differentially private with respect to the original data, and then non-privately training a model on the synthetic data. Doing so… ▽ More

    Submitted 10 January, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: 31 pages; 7 figures; compared to previous version added result of LoRa-finetuning

  4. arXiv:2304.06929  [pdf

    cs.CR

    Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

    Authors: Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, Wanrong Zhang

    Abstract: In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 20… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  5. arXiv:2302.10149  [pdf, other

    cs.CR cs.LG

    Poisoning Web-Scale Training Datasets is Practical

    Authors: Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tramèr

    Abstract: Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In this paper, we introduce two new dataset poisoning attacks that intentionally introduce malicious examples to a model's performance. Our attacks are immediately practical and could, today, poison 10 popular datasets. Our first attack, split-view poisoning, exploits the mutable nature of internet… ▽ More

    Submitted 6 May, 2024; v1 submitted 20 February, 2023; originally announced February 2023.

  6. arXiv:2302.07956  [pdf, other

    cs.LG cs.CR

    Tight Auditing of Differentially Private Machine Learning

    Authors: Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

    Abstract: Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implaus… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  7. arXiv:2206.10469  [pdf, other

    cs.LG cs.CR

    The Privacy Onion Effect: Memorization is Relative

    Authors: Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

    Abstract: Machine learning models trained on private datasets have been shown to leak their private data. While recent work has found that the average data point is rarely leaked, the outlier samples are frequently subject to memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable to a privac… ▽ More

    Submitted 22 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

  8. arXiv:2202.12219  [pdf, other

    cs.LG

    Debugging Differential Privacy: A Case Study for Privacy Auditing

    Authors: Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini

    Abstract: Differential Privacy can provide provable privacy guarantees for training data in machine learning. However, the presence of proofs does not preclude the presence of errors. Inspired by recent advances in auditing which have been used for estimating lower bounds on differentially private algorithms, here we show that auditing can also be used to find flaws in (purportedly) differentially private s… ▽ More

    Submitted 28 March, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  9. arXiv:2201.12328  [pdf, other

    cs.LG

    Toward Training at ImageNet Scale with Differential Privacy

    Authors: Alexey Kurakin, Shuang Song, Steve Chien, Roxana Geambasu, Andreas Terzis, Abhradeep Thakurta

    Abstract: Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set ou… ▽ More

    Submitted 8 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: 25 pages, 7 figures. Code available at https://github.com/google-research/dp-imagenet

  10. arXiv:2112.03570  [pdf, other

    cs.CR cs.LG

    Membership Inference Attacks From First Principles

    Authors: Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer

    Abstract: A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset. These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set. We argue that attacks should instea… ▽ More

    Submitted 12 April, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

  11. arXiv:2106.09667  [pdf, other

    cs.LG

    Poisoning and Backdooring Contrastive Learning

    Authors: Nicholas Carlini, Andreas Terzis

    Abstract: Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.01% of a dataset (e.g., just 300 images of the 3 million-example Conceptual Captions dataset), we… ▽ More

    Submitted 28 March, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

  12. Phoenix: An Epidemic Approach to Time Reconstruction

    Authors: Jayant Gupchup, Douglas Carlson, Răzvan Musăloiu-E., Alex Szalay, Andreas Terzis

    Abstract: Harsh deployment environments and uncertain run-time conditions create numerous challenges for postmortem time reconstruction methods. For example, motes often reboot and thus lose their clock state, considering that the majority of mote platforms lack a real-time clock. While existing time reconstruction methods for long-term data gathering networks rely on a persistent basestation for assigning… ▽ More

    Submitted 2 February, 2019; originally announced February 2019.

    Journal ref: EWSN 2010 Proceedings of the 7th European Conference on Wireless Sensor Networks

  13. arXiv:1601.00960  [pdf, other

    cs.CY

    High Frequency Remote Monitoring of Parkinson's Disease via Smartphone: Platform Overview and Medication Response Detection

    Authors: Andong Zhan, Max A. Little, Denzil A. Harris, Solomon O. Abiola, E. Ray Dorsey, Suchi Saria, Andreas Terzis

    Abstract: Objective: The aim of this study is to develop a smartphone-based high-frequency remote monitoring platform, assess its feasibility for remote monitoring of symptoms in Parkinson's disease, and demonstrate the value of data collected using the platform by detecting dopaminergic medication response. Methods: We have developed HopkinsPD, a novel smartphone-based monitoring platform, which measures s… ▽ More

    Submitted 5 January, 2016; originally announced January 2016.

  14. arXiv:1408.2284  [pdf, other

    cs.DC

    Hadoop in Low-Power Processors

    Authors: Da Zheng, Alexander Szalay, Andreas Terzis

    Abstract: In our previous work we introduced a so-called Amdahl blade microserver that combines a low-power Atom processor, with a GPU and an SSD to provide a balanced and energy-efficient system. Our preliminary results suggested that the sequential I/O of Amdahl blades can be ten times higher than that a cluster of conventional servers with comparable power consumption. In this paper we investigate the pe… ▽ More

    Submitted 10 August, 2014; originally announced August 2014.

  15. arXiv:0901.3923  [pdf, ps, other

    cs.NI cs.CV

    Model-Based Event Detection in Wireless Sensor Networks

    Authors: Jayant Gupchup, Andreas Terzis, Randal Burns, Alex Szalay

    Abstract: In this paper we present an application of techniques from statistical signal processing to the problem of event detection in wireless sensor networks used for environmental monitoring. The proposed approach uses the well-established Principal Component Analysis (PCA) technique to build a compact model of the observed phenomena that is able to capture daily and seasonal trends in the collected m… ▽ More

    Submitted 25 January, 2009; originally announced January 2009.

    Journal ref: Workshop for Data Sharing and Interoperability on the World Wide Web (DSI 2007). April 2007, In Proceedings

  16. arXiv:cs/0701170  [pdf

    cs.DB cs.CE

    Life Under Your Feet: An End-to-End Soil Ecology Sensor Network, Database, Web Server, and Analysis Service

    Authors: Katalin Szlavecz, Andreas Terzis, Stuart Ozer, Razvan Musaloiu-E, Joshua Cogan, Sam Small, Randal Burns, Jim Gray, Alex Szalay

    Abstract: Wireless sensor networks can revolutionize soil ecology by providing measurements at temporal and spatial granularities previously impossible. This paper presents a soil monitoring system we developed and deployed at an urban forest in Baltimore as a first step towards realizing this vision. Motes in this network measure and save soil moisture and temperature in situ every minute. Raw measuremen… ▽ More

    Submitted 26 January, 2007; originally announced January 2007.

    Report number: MSR TR 2006 90