Zum Hauptinhalt springen

Showing 1–50 of 72 results for author: Nasr, M

.
  1. arXiv:2408.15450  [pdf, other

    cs.LG cs.CV

    Avoiding Generative Model Writer's Block With Embedding Nudging

    Authors: Ali Zand, Milad Nasr

    Abstract: Generative image models, since introduction, have become a global phenomenon. From new arts becoming possible to new vectors of abuse, many new capabilities have become available. One of the challenging issues with generative models is controlling the generation process specially to prevent specific generations classes or instances . There are several reasons why one may want to control the output… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  2. arXiv:2408.00892  [pdf, other

    q-bio.BM cs.LG

    Peptide Sequencing Via Protein Language Models

    Authors: Thuong Le Hoai Pham, Jillur Rahman Saurav, Aisosa A. Omere, Calvin J. Heyl, Mohammad Sadegh Nasr, Cody Tyler Reynolds, Jai Prakash Yadav Veerla, Helen H Shang, Justyn Jaworski, Alison Ravenscraft, Joseph Anthony Buonomo, Jacob M. Luber

    Abstract: We introduce a protein language model for determining the complete sequence of a peptide based on measurement of a limited set of amino acids. To date, protein sequencing relies on mass spectrometry, with some novel edman degregation based platforms able to sequence non-native peptides. Current protein sequencing techniques face limitations in accurately identifying all amino acids, hindering comp… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  3. arXiv:2405.20485  [pdf, other

    cs.CR cs.CL cs.LG

    Phantom: General Trigger Attacks on Retrieval Augmented Language Generation

    Authors: Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A. Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, Alina Oprea

    Abstract: Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) in chatbot applications, enabling developers to adapt and personalize the LLM output without expensive training or fine-tuning. RAG systems use an external knowledge database to retrieve the most relevant documents for a given query, providing this context to the LLM generator. While RAG achieves i… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2404.17342  [pdf, other

    cs.CL cs.AI

    Can a Multichoice Dataset be Repurposed for Extractive Question Answering?

    Authors: Teresa Lynn, Malik H. Altakrori, Samar Mohamed Magdy, Rocktim Jyoti Das, Chenyang Lyu, Mohamed Nasr, Younes Samih, Alham Fikri Aji, Preslav Nakov, Shantanu Godbole, Salim Roukos, Radu Florian, Nizar Habash

    Abstract: The rapid evolution of Natural Language Processing (NLP) has favored major languages such as English, leaving a significant gap for many others due to limited resources. This is especially evident in the context of data annotation, a task whose importance cannot be underestimated, but which is time-consuming and costly. Thus, any dataset for resource-poor languages is precious, in particular when… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Paper 8 pages, Appendix 12 pages. Submitted to ARR

  5. arXiv:2403.06634  [pdf, other

    cs.CR

    Stealing Part of a Production Language Model

    Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Itay Yona, Eric Wallace, David Rolnick, Florian Tramèr

    Abstract: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \… ▽ More

    Submitted 9 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  6. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  7. arXiv:2402.12329  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Query-Based Adversarial Prompt Generation

    Authors: Jonathan Hayase, Ema Borevkovic, Nicholas Carlini, Florian Tramèr, Milad Nasr

    Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  8. arXiv:2402.09403  [pdf, other

    cs.CR

    Auditing Private Prediction

    Authors: Karan Chadha, Matthew Jagielski, Nicolas Papernot, Christopher Choquette-Choo, Milad Nasr

    Abstract: Differential privacy (DP) offers a theoretical upper bound on the potential privacy leakage of analgorithm, while empirical auditing establishes a practical lower bound. Auditing techniques exist forDP training algorithms. However machine learning can also be made private at inference. We propose thefirst framework for auditing private prediction where we instantiate adversaries with varying poiso… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  9. arXiv:2401.04343  [pdf, other

    cs.LG cs.CL cs.CR

    Private Fine-tuning of Large Language Models with Zeroth-order Optimization

    Authors: Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, Prateek Mittal

    Abstract: Differentially private stochastic gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner, but has proven difficult to scale to the era of foundation models. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods. A key insight into the design of our method is that the direction of the gradient in… ▽ More

    Submitted 12 August, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  10. arXiv:2401.02882  [pdf, other

    cs.HC q-bio.TO

    SpatialVisVR: An Immersive, Multiplexed Medical Image Viewer With Contextual Similar-Patient Search

    Authors: Jai Prakash Veerla, Partha Sai Guttikonda, Amir Hajighasemi, Jillur Rahman Saurav, Aarti Darji, Cody T. Reynolds, Mohamed Mohamed, Mohammad S. Nasr, Helen H. Shang, Jacob M. Luber

    Abstract: In contemporary pathology, multiplexed immunofluorescence (mIF) and multiplex immunohistochemistry (mIHC) present both significant opportunities and challenges. These methodologies shed light on intricate tumor microenvironment interactions, emphasizing the need for intuitive visualization tools to analyze vast biological datasets effectively. As electronic health records (EHR) proliferate and phy… ▽ More

    Submitted 11 May, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  11. arXiv:2401.02565  [pdf, other

    eess.IV cs.CV q-bio.TO

    Demonstration of an Adversarial Attack Against a Multimodal Vision Language Model for Pathology Imaging

    Authors: Poojitha Thota, Jai Prakash Veerla, Partha Sai Guttikonda, Mohammad S. Nasr, Shirin Nilizadeh, Jacob M. Luber

    Abstract: In the context of medical artificial intelligence, this study explores the vulnerabilities of the Pathology Language-Image Pretraining (PLIP) model, a Vision Language Foundation model, under targeted attacks. Leveraging the Kather Colon dataset with 7,180 H&E images across nine tissue types, our investigation employs Projected Gradient Descent (PGD) adversarial perturbation attacks to induce miscl… ▽ More

    Submitted 7 May, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

  12. arXiv:2401.02564  [pdf, other

    q-bio.TO cs.CV

    Predicting Future States with Spatial Point Processes in Single Molecule Resolution Spatial Transcriptomics

    Authors: Parisa Boodaghi Malidarreh, Biraaj Rout, Mohammad Sadegh Nasr, Priyanshi Borad, Jillur Rahman Saurav, Jai Prakash Veerla, Kelli Fenelon, Theodora Koromila, Jacob M. Luber

    Abstract: In this paper, we introduce a pipeline based on Random Forest Regression to predict the future distribution of cells that are expressed by the Sog-D gene (active cells) in both the Anterior to posterior (AP) and the Dorsal to Ventral (DV) axis of the Drosophila in embryogenesis process. This method provides insights about how cells and living organisms control gene expression in super resolution w… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  13. arXiv:2312.15386  [pdf, other

    physics.data-an astro-ph.EP eess.IV physics.ao-ph

    Hyperspectral shadow removal with Iterative Logistic Regression and latent Parametric Linear Combination of Gaussians

    Authors: Core Francisco Park, Maya Nasr, Manuel Pérez-Carrasco, Eleanor Walker, Douglas Finkbeiner, Cecilia Garraffo

    Abstract: Shadow detection and removal is a challenging problem in the analysis of hyperspectral images. Yet, this step is crucial for analyzing data for remote sensing applications like methane detection. In this work, we develop a shadow detection and removal method only based on the spectrum of each pixel and the overall distribution of spectral values. We first introduce Iterative Logistic Regression (I… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  14. arXiv:2312.12587  [pdf, other

    eess.SP cs.DC q-bio.TO

    Real-Time Diagnostic Integrity Meets Efficiency: A Novel Platform-Agnostic Architecture for Physiological Signal Compression

    Authors: Neel R Vora, Amir Hajighasemi, Cody T. Reynolds, Amirmohammad Radmehr, Mohamed Mohamed, Jillur Rahman Saurav, Abdul Aziz, Jai Prakash Veerla, Mohammad S Nasr, Hayden Lotspeich, Partha Sai Guttikonda, Thuong Pham, Aarti Darji, Parisa Boodaghi Malidarreh, Helen H Shang, Jay Harvey, Kan Ding, Phuc Nguyen, Jacob M Luber

    Abstract: Head-based signals such as EEG, EMG, EOG, and ECG collected by wearable systems will play a pivotal role in clinical diagnosis, monitoring, and treatment of important brain disorder diseases. However, the real-time transmission of the significant corpus physiological signals over extended periods consumes substantial power and time, limiting the viability of battery-dependent physiological monit… ▽ More

    Submitted 4 January, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  15. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  16. arXiv:2311.17035  [pdf, other

    cs.LG cs.CL cs.CR

    Scalable Extraction of Training Data from (Production) Language Models

    Authors: Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee

    Abstract: This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  17. arXiv:2311.06477  [pdf, other

    cs.CY

    Report of the 1st Workshop on Generative AI and Law

    Authors: A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry , et al. (10 additional authors not shown)

    Abstract: This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report… ▽ More

    Submitted 2 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

  18. arXiv:2309.06006  [pdf, ps, other

    cs.CV cs.AI

    SoccerNet 2023 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

    Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  19. arXiv:2309.05610  [pdf, other

    cs.CR cs.LG

    Privacy Side Channels in Machine Learning Systems

    Authors: Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr

    Abstract: Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum. Yet, in reality, these models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates… ▽ More

    Submitted 18 July, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: USENIX Security 2024

  20. arXiv:2309.04858  [pdf, other

    cs.LG cs.CL cs.CR

    Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System

    Authors: Daphne Ippolito, Nicholas Carlini, Katherine Lee, Milad Nasr, Yun William Yu

    Abstract: Neural language models are increasingly deployed into APIs and websites that allow a user to pass in a prompt and receive generated text. Many of these systems do not reveal generation parameters. In this paper, we present methods to reverse-engineer the decoding method used to generate text (i.e., top-$k$ or nucleus sampling). Our ability to discover which decoding strategy was used has implicati… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

    Comments: 6 pages, 4 figures, 3 tables. Also, 5 page appendix. Accepted to INLG 2023

  21. arXiv:2307.15043  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Authors: Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson

    Abstract: Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures -- so-called "jailbreaks" against LLMs -- these attacks have required significant human ingenuity and are brittle in practic… ▽ More

    Submitted 20 December, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Website: http://llm-attacks.org/

  22. arXiv:2306.17019  [pdf, other

    eess.IV cs.CV q-bio.TO

    Histopathology Slide Indexing and Search: Are We There Yet?

    Authors: Helen H. Shang, Mohammad Sadegh Nasr, Jai Prakash Veerla, Parisa Boodaghi Malidarreh, MD Jillur Rahman Saurav, Amir Hajighasemi, Manfred Huber, Chace Moleta, Jitin Makker, Jacob M. Luber

    Abstract: The search and retrieval of digital histopathology slides is an important task that has yet to be solved. In this case study, we investigate the clinical readiness of three state-of-the-art histopathology slide search engines, Yottixel, SISH, and RetCCL, on three patients with solid tumors. We provide a qualitative assessment of each model's performance in providing retrieval results that are reli… ▽ More

    Submitted 4 January, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

  23. arXiv:2306.16989  [pdf

    q-bio.TO cs.CV eess.IV

    The State of Applying Artificial Intelligence to Tissue Imaging for Cancer Research and Early Detection

    Authors: Michael Robben, Amir Hajighasemi, Mohammad Sadegh Nasr, Jai Prakesh Veerla, Anne M. Alsup, Biraaj Rout, Helen H. Shang, Kelli Fowlds, Parisa Boodaghi Malidarreh, Paul Koomey, MD Jillur Rahman Saurav, Jacob M. Luber

    Abstract: Artificial intelligence represents a new frontier in human medicine that could save more lives and reduce the costs, thereby increasing accessibility. As a consequence, the rate of advancement of AI in cancer medical imaging and more particularly tissue pathology has exploded, opening it to ethical and technical questions that could impede its adoption into existing systems. In order to chart the… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Journal ref: F1000Research 2023, 12:1436

  24. arXiv:2306.15447  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Are aligned neural networks adversarially aligned?

    Authors: Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt

    Abstract: Large language models are now tuned to align with the goals of their creators, namely to be "helpful and harmless." These models should respond helpfully to user questions, but refuse to answer requests that could cause harm. However, adversarial users can construct inputs which circumvent attempts at alignment. In this work, we study adversarial alignment, and ask to what extent these models rema… ▽ More

    Submitted 6 May, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

  25. arXiv:2306.06780  [pdf, other

    eess.IV cs.CV q-bio.QM

    Multimodal Pathology Image Search Between H&E Slides and Multiplexed Immunofluorescent Images

    Authors: Amir Hajighasemi, MD Jillur Rahman Saurav, Mohammad S Nasr, Jai Prakash Veerla, Aarti Darji, Parisa Boodaghi Malidarreh, Michael Robben, Helen H Shang, Jacob M Luber

    Abstract: We present an approach for multimodal pathology image search, using dynamic time warping (DTW) on Variational Autoencoder (VAE) latent space that is fed into a ranked choice voting scheme to retrieve multiplexed immunofluorescent imaging (mIF) that is most similar to a query H&E slide. Through training the VAE and applying DTW, we align and compare mIF and H&E slides. Our method improves different… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  26. arXiv:2305.08846  [pdf, other

    cs.LG cs.CR cs.DS

    Privacy Auditing with One (1) Training Run

    Authors: Thomas Steinke, Milad Nasr, Matthew Jagielski

    Abstract: We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently. We analyze this using the connection between differential privacy and statistical generalization, which avoids the cost of group privacy. Our auditing scheme requires minimal assumptions a… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  27. arXiv:2305.05973  [pdf, other

    cs.CL cs.CR cs.IR

    Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models

    Authors: Aldo Gael Carranza, Rezsa Farahani, Natalia Ponomareva, Alex Kurakin, Matthew Jagielski, Milad Nasr

    Abstract: We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable, making them difficult to directly DP-train with since common techniques require per-example gradients. To address this issue, we propose an approach that prioritizes… ▽ More

    Submitted 23 May, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted to NAACL 2024

  28. arXiv:2303.13332  [pdf, other

    eess.IV cs.CV q-bio.QM

    Clinically Relevant Latent Space Embedding of Cancer Histopathology Slides through Variational Autoencoder Based Image Compression

    Authors: Mohammad Sadegh Nasr, Amir Hajighasemi, Paul Koomey, Parisa Boodaghi Malidarreh, Michael Robben, Jillur Rahman Saurav, Helen H. Shang, Manfred Huber, Jacob M. Luber

    Abstract: In this paper, we introduce a Variational Autoencoder (VAE) based training approach that can compress and decompress cancer pathology slides at a compression ratio of 1:512, which is better than the previously reported state of the art (SOTA) in the literature, while still maintaining accuracy in clinical validation tasks. The compression approach was tested on more common computer vision datasets… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Journal ref: 2023 IEEE ISBI, Cartagena, Colombia, 2023, pp. 1-5

  29. arXiv:2303.03446  [pdf, other

    cs.CR cs.LG

    Students Parrot Their Teachers: Membership Inference on Model Distillation

    Authors: Matthew Jagielski, Milad Nasr, Christopher Choquette-Choo, Katherine Lee, Nicholas Carlini

    Abstract: Model distillation is frequently proposed as a technique to reduce the privacy leakage of machine learning. These empirical privacy defenses rely on the intuition that distilled ``student'' models protect the privacy of training data, as they only interact with this data indirectly through a ``teacher'' model. In this work, we design membership inference attacks to systematically study the privacy… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: 16 pages, 12 figures

  30. arXiv:2302.09483  [pdf, other

    cs.LG

    Why Is Public Pretraining Necessary for Private Model Training?

    Authors: Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, Lun Wang

    Abstract: In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported with the use of pretraining on publicly available data. This is in part due to the benefits of transfer learning, which is the standard motivation for pretraining in non-private settings. However, the stark contrast in the improvement achieved through pretrai… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

  31. arXiv:2302.07956  [pdf, other

    cs.LG cs.CR

    Tight Auditing of Differentially Private Machine Learning

    Authors: Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

    Abstract: Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implaus… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  32. arXiv:2301.13188  [pdf, other

    cs.CR cs.CV cs.LG

    Extracting Training Data from Diffusion Models

    Authors: Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

    Abstract: Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  33. arXiv:2210.17546  [pdf, other

    cs.LG cs.CL

    Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

    Authors: Daphne Ippolito, Florian Tramèr, Milad Nasr, Chiyuan Zhang, Matthew Jagielski, Katherine Lee, Christopher A. Choquette-Choo, Nicholas Carlini

    Abstract: Studying data memorization in neural language models helps us understand the risks (e.g., to privacy or copyright) associated with models regurgitating training data and aids in the development of countermeasures. Many prior works -- and some recently deployed defenses -- focus on "verbatim memorization", defined as a model generation that exactly matches a substring from the training set. We argu… ▽ More

    Submitted 11 September, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

  34. arXiv:2209.14987  [pdf, other

    cs.LG cs.CR

    No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy"

    Authors: Nicholas Carlini, Vitaly Feldman, Milad Nasr

    Abstract: New methods designed to preserve data privacy require careful scrutiny. Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a ``privacy-preserving'' method is attacked. A recent work selected for an Outstanding Paper Award at ICML 2022 (Dong et al., 2022) claims that dataset condensation (DC) significantly improves data privacy when tr… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  35. arXiv:2206.03852  [pdf, other

    cs.IR cs.LG

    FEL: High Capacity Learning for Recommendation and Ranking via Federated Ensemble Learning

    Authors: Meisam Hejazinia, Dzmitry Huba, Ilias Leontiadis, Kiwan Maeng, Mani Malek, Luca Melis, Ilya Mironov, Milad Nasr, Kaikai Wang, Carole-Jean Wu

    Abstract: Federated learning (FL) has emerged as an effective approach to address consumer privacy needs. FL has been successfully applied to certain machine learning tasks, such as training smart keyboard models and keyword spotting. Despite FL's initial success, many important deep learning use cases, such as ranking and recommendation tasks, have been limited from on-device learning. One of the key chall… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  36. arXiv:2205.10373  [pdf, other

    eess.IV cs.CV q-bio.QM q-bio.TO

    A SSIM Guided cGAN Architecture For Clinically Driven Generative Image Synthesis of Multiplexed Spatial Proteomics Channels

    Authors: Jillur Rahman Saurav, Mohammad Sadegh Nasr, Paul Koomey, Michael Robben, Manfred Huber, Jon Weidanz, Bríd Ryan, Eytan Ruppin, Peng Jiang, Jacob M. Luber

    Abstract: Here we present a structural similarity index measure (SSIM) guided conditional Generative Adversarial Network (cGAN) that generatively performs image-to-image (i2i) synthesis to generate photo-accurate protein channels in multiplexed spatial proteomics images. This approach can be utilized to accurately generate missing spatial proteomics channels that were not included during experimental data c… ▽ More

    Submitted 11 June, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

    Journal ref: 2023 IEEE CIBCB, Eindhoven, Netherlands, 2023, pp. 1-8

  37. arXiv:2203.09718  [pdf

    physics.acc-ph hep-ex

    An Impartial Perspective for Superconducting Nb3Sn coated Copper RF Cavities for Future Accelerators

    Authors: E. Barzi, B. C. Barish, R. A. Rimmer, A. Valente-Feliciano, C. M. Rey, W. A. Barletta, E. Nanni, M. Nasr, M. Ross, M. Schneider, S. Tantawi, P. B. Welander, E. I. Simakov, I. O. Usov, L. Alff, N. Karabas, M. Major, J. P. Palakkal, S. Petzold, N. Pietralla, N. Schäfer, A. Kikuchi, H. Hayano, H. Ito, S. Kashiwaji , et al. (10 additional authors not shown)

    Abstract: This Snowmass21 Contributed Paper encourages the Particle Physics community in fostering R&D in Superconducting Nb3Sn coated Copper RF Cavities instead of costly bulk Niobium. It describes the pressing need to devote effort in this direction, which would deliver higher gradient and higher temperature of operation and reduce the overall capital and operational costs of any future collider. It is un… ▽ More

    Submitted 26 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: Contribution to Snowmass 2021

    Report number: FERMILAB-CONF-22-134-TD

  38. arXiv:2112.03570  [pdf, other

    cs.CR cs.LG

    Membership Inference Attacks From First Principles

    Authors: Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer

    Abstract: A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset. These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set. We argue that attacks should instea… ▽ More

    Submitted 12 April, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

  39. arXiv:2110.15800  [pdf, other

    hep-ex hep-ph physics.acc-ph

    C$^3$: A "Cool" Route to the Higgs Boson and Beyond

    Authors: Mei Bai, Tim Barklow, Rainer Bartoldus, Martin Breidenbach, Philippe Grenier, Zhirong Huang, Michael Kagan, John Lewellen, Zenghai Li, Thomas W. Markiewicz, Emilio A. Nanni, Mamdouh Nasr, Cho-Kuen Ng, Marco Oriunno, Michael E. Peskin, Thomas G. Rizzo, James Rosenzweig, Ariel G. Schwartzman, Vladimir Shiltsev, Evgenya Simakov, Bruno Spataro, Dong Su, Sami Tantawi, Caterina Vernieri, Glen White , et al. (1 additional authors not shown)

    Abstract: We present a proposal for a cold copper distributed coupling accelerator that can provide a rapid route to precision Higgs physics with a compact 8 km footprint. This proposal is based on recent advances that increase the efficiency and operating gradient of a normal conducting accelerator. This technology also provides an $e^{+}e^{-}$ collider path to physics at multi-TeV energies. In this articl… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: 34 pages, 8 figures, Contribution to Snowmass 2021. The editors can be contacted at the following addresses: [email protected], [email protected], [email protected], [email protected]

    Report number: SLAC-PUB-17629

  40. arXiv:2110.08324  [pdf, other

    cs.CR cs.LG

    Mitigating Membership Inference Attacks by Self-Distillation Through a Novel Ensemble Architecture

    Authors: Xinyu Tang, Saeed Mahloujifar, Liwei Song, Virat Shejwalkar, Milad Nasr, Amir Houmansadr, Prateek Mittal

    Abstract: Membership inference attacks are a key measure to evaluate privacy leakage in machine learning (ML) models. These attacks aim to distinguish training members from non-members by exploiting differential behavior of the models on member and non-member inputs. The goal of this work is to train ML models that have high membership privacy while largely preserving their utility; we therefore aim for an… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  41. arXiv:2107.03924  [pdf

    cs.CY cs.AI cs.HC cs.MA cs.NI

    Smart Healthcare in the Age of AI: Recent Advances, Challenges, and Future Prospects

    Authors: Mahmoud Nasr, MD. Milon Islam, Shady Shehata, Fakhri Karray, Yuri Quintana

    Abstract: The significant increase in the number of individuals with chronic ailments (including the elderly and disabled) has dictated an urgent need for an innovative model for healthcare systems. The evolved model will be more personalized and less reliant on traditional brick-and-mortar healthcare institutions such as hospitals, nursing homes, and long-term healthcare centers. The smart healthcare syste… ▽ More

    Submitted 24 June, 2021; originally announced July 2021.

  42. arXiv:2102.00918  [pdf, other

    cs.CR

    Robust Adversarial Attacks Against DNN-Based Wireless Communication Systems

    Authors: Alireza Bahramali, Milad Nasr, Amir Houmansadr, Dennis Goeckel, Don Towsley

    Abstract: Deep Neural Networks (DNNs) have become prevalent in wireless communication systems due to their promising performance. However, similar to other DNN-based applications, they are vulnerable to adversarial examples. In this work, we propose an input-agnostic, undetectable, and robust adversarial attack against DNN-based wireless communication systems in both white-box and black-box scenarios. We de… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

  43. arXiv:2101.04535  [pdf, other

    cs.LG cs.CR

    Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning

    Authors: Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini

    Abstract: Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D' that differs in just one example.If observing the training algorithm does not meaningfully increase the adversary's odds of successfull… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

  44. arXiv:2012.15539  [pdf, ps, other

    math.NT

    On algebraic integers which are 2-Salem elements in positive characteristic

    Authors: Mabrouk Nasr, Hassen Kthiri, Jean-Louis Verger-Gaugry

    Abstract: Bateman and Duquette have initiated the study of Salem elements in positive characteristic. This work extends their results to 2-Salem elements whose minimal polynomials are of the type $Y^n + λ_{n-1}Y^{n-1} + \ldots+λ_1Y + λ_0 \in \mathbb{F}_q[X][Y ]$ where $n \geq2, λ_0\neq 0$ and $°λ_{n-1} < °λ_{n-2} = \max_{i\neq n-2}°(λ_i)$. This work provides an analogue of their results for 2-Salem elements… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

  45. Experimental demonstration of particle acceleration with normal conducting accelerating structure at cryogenic temperature

    Authors: Mamdouh Nasr, Emilio Nanni, Martin Breidenbach, Stephen Weathersby, Marco Oriunno, Sami Tantawi

    Abstract: Reducing the operating temperature of normal conducting particle accelerators substantially increases their efficiency. Low-temperature operation increases the yield strength of the accelerator material and reduces surface resistance, hence a great reduction in cyclic fatigue could be achieved resulting in a large reduction in breakdown rates compared to room-temperature operation. Furthermore, te… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

    Journal ref: Phys. Rev. Accel. Beams 24, 093201 (2021)

  46. Benchmarking Meta-heuristic Optimization

    Authors: Mona Nasr, Omar Farouk, Ahmed Mohamedeen, Ali Elrafie, Marwan Bedeir, Ali Khaled

    Abstract: Solving an optimization task in any domain is a very challenging problem, especially when dealing with nonlinear problems and non-convex functions. Many meta-heuristic algorithms are very efficient when solving nonlinear functions. A meta-heuristic algorithm is a problem-independent technique that can be applied to a broad range of problems. In this experiment, some of the evolutionary algorithms… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: International Journal of Advanced Networking and Applications - IJANA

  47. arXiv:2007.11524  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Improving Deep Learning with Differential Privacy using Gradient Encoding and Denoising

    Authors: Milad Nasr, Reza Shokri, Amir houmansadr

    Abstract: Deep learning models leak significant amounts of information about their training datasets. Previous work has investigated training models with differential privacy (DP) guarantees through adding DP noise to the gradients. However, such solutions (specifically, DPSGD), result in large degradations in the accuracy of the trained models. In this paper, we aim at training deep learning models with DP… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

  48. arXiv:2003.12591  [pdf, other

    quant-ph physics.optics

    Spectrally reconfigurable quantum emitters enabled by optimized fast modulation

    Authors: Daniil M. Lukin, Alexander D. White, Rahul Trivedi, Melissa A. Guidry, Naoya Morioka, Charles Babin, Öney O. Soykal, Jawad Ul Hassan, Nguyen Tien Son, Takeshi Ohshima, Praful K. Vasireddy, Mamdouh H. Nasr, Shuo Sun, Jean-Phillipe W. MacLean, Constantin Dory, Emilio A. Nanni, Jörg Wrachtrup, Florian Kaiser, Jelena Vučković

    Abstract: The ability to shape photon emission facilitates strong photon-mediated interactions between disparate physical systems, thereby enabling applications in quantum information processing, simulation and communication. Spectral control in solid state platforms such as color centers, rare earth ions, and quantum dots is particularly attractive for realizing such applications on-chip. Here we propose t… ▽ More

    Submitted 27 July, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

    Comments: 9 pages, 6 figures; Supplementary Information

    Journal ref: npj Quantum Inf 6, 80 (2020)

  49. arXiv:2003.06083  [pdf, other

    physics.acc-ph cond-mat.mtrl-sci hep-ex physics.app-ph physics.bio-ph

    An Ultra-Compact X-Ray Free-Electron Laser

    Authors: J. B. Rosenzweig, N. Majernik, R. R. Robles, G. Andonian, O. Camacho, A. Fukasawa, A. Kogar, G. Lawler, Jianwei Miao, P. Musumeci, B. Naranjo, Y. Sakai, R. Candler, B. Pound, C. Pellegrini, C. Emma, A. Halavanau, J. Hastings, Z. Li, M. Nasr, S. Tantawi, P. Anisimov, B. Carlsten, F. Krawczyk, E. Simakov , et al. (11 additional authors not shown)

    Abstract: In the field of beam physics, two frontier topics have taken center stage due to their potential to enable new approaches to discovery in a wide swath of science. These areas are: advanced, high gradient acceleration techniques, and x-ray free electron lasers (XFELs). Further, there is intense interest in the marriage of these two fields, with the goal of producing a very compact XFEL. In this con… ▽ More

    Submitted 14 August, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: 80 pages, 24 figures

  50. arXiv:2002.06495  [pdf, other

    cs.CR cs.LG

    Blind Adversarial Network Perturbations

    Authors: Milad Nasr, Alireza Bahramali, Amir Houmansadr

    Abstract: Deep Neural Networks (DNNs) are commonly used for various traffic analysis problems, such as website fingerprinting and flow correlation, as they outperform traditional (e.g., statistical) techniques by large margins. However, deep neural networks are known to be vulnerable to adversarial examples: adversarial inputs to the model that get labeled incorrectly by the model due to small adversarial p… ▽ More

    Submitted 15 February, 2020; originally announced February 2020.