Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Brown, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03234  [pdf, other

    cs.LG cs.CL cs.CR

    Self-Evaluation as a Defense Against Adversarial Attacks on LLMs

    Authors: Hannah Brown, Leon Lin, Kenji Kawaguchi, Michael Shieh

    Abstract: We introduce a defense against adversarial attacks on LLMs utilizing self-evaluation. Our method requires no model fine-tuning, instead using pre-trained models to evaluate the inputs and outputs of a generator model, significantly reducing the cost of implementation in comparison to other, finetuning-based methods. Our method can significantly reduce the attack success rate of attacks on both ope… ▽ More

    Submitted 6 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, 7 figures

  2. arXiv:2407.03232  [pdf, other

    cs.LG cs.CL

    Single Character Perturbations Break LLM Alignment

    Authors: Leon Lin, Hannah Brown, Kenji Kawaguchi, Michael Shieh

    Abstract: When LLMs are deployed in sensitive, human-facing settings, it is crucial that they do not output unsafe, biased, or privacy-violating outputs. For this reason, models are both trained and instructed to refuse to answer unsafe prompts such as "Tell me how to build a bomb." We find that, despite these safeguards, it is possible to break model defenses simply by appending a space to the end of a mod… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures

  3. arXiv:2401.01623  [pdf, other

    cs.AI cs.CL

    Can AI Be as Creative as Humans?

    Authors: Haonan Wang, James Zou, Michael Mozer, Anirudh Goyal, Alex Lamb, Linjun Zhang, Weijie J Su, Zhun Deng, Michael Qizhe Xie, Hannah Brown, Kenji Kawaguchi

    Abstract: Creativity serves as a cornerstone for societal progress and innovation. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. In this paper, we prove in theory that AI can be as creative as humans under the condition that it can properly fit the da… ▽ More

    Submitted 25 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: The paper examines AI's creativity, introducing Relative and Statistical Creativity for theoretical and practical analysis, along with practical training guidelines. Project Page: ai-relative-creativity.github.io

  4. arXiv:2312.02614  [pdf, other

    cs.LG cs.CL

    Prompt Optimization via Adversarial In-Context Learning

    Authors: Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh, Junxian He

    Abstract: We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough outp… ▽ More

    Submitted 22 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: ACL 2024

  5. arXiv:2310.06514  [pdf, other

    cs.LG

    AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

    Authors: Yang Zhang, Yawei Li, Hannah Brown, Mina Rezaei, Bernd Bischl, Philip Torr, Ashkan Khakzar, Kenji Kawaguchi

    Abstract: Feature attribution explains neural network outputs by identifying relevant input features. The attribution has to be faithful, meaning that the attributed features must mirror the input features that influence the output. One recent trend to test faithfulness is to fit a model on designed data with known relevant features and then compare attributions with ground truth input features.This idea as… ▽ More

    Submitted 14 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Appear at NeurIPS 2023 Workshop XAIA

  6. arXiv:2306.12609  [pdf, other

    cs.AI cs.CY

    Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities

    Authors: Xudong Shen, Hannah Brown, Jiashu Tao, Martin Strobel, Yao Tong, Akshay Narayan, Harold Soh, Finale Doshi-Velez

    Abstract: There is increasing attention being given to how to regulate AI systems. As governing bodies grapple with what values to encapsulate into regulation, we consider the technical half of the question: To what extent can AI experts vet an AI system for adherence to regulatory requirements? We investigate this question through the lens of two public sector procurement checklists, identifying what we ca… ▽ More

    Submitted 27 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: scheduled for publication in the Communications of the ACM, titled "Directions of Technical Innovation for Regulatable AI Systems"

  7. arXiv:2202.05520  [pdf, other

    stat.ML cs.CL cs.LG

    What Does it Mean for a Language Model to Preserve Privacy?

    Authors: Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, Florian Tramèr

    Abstract: Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets. An adversary can exploit this tendency to extract training data. Depending on the nature of the content and the context in which this d… ▽ More

    Submitted 14 February, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: 21 pages, 2 figures