Skip to main content

Showing 1–1 of 1 results for author: Wei, F A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13213  [pdf, other

    cs.LG cs.CL cs.CY

    From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards

    Authors: Khaoula Chehbouni, Megha Roshan, Emmanuel Ma, Futian Andrew Wei, Afaf Taik, Jackie CK Cheung, Golnoosh Farnadi

    Abstract: Recent progress in large language models (LLMs) has led to their widespread adoption in various domains. However, these advancements have also introduced additional safety risks and raised concerns regarding their detrimental impact on already marginalized populations. Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging saf… ▽ More

    Submitted 5 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures. Accepted to Findings of the Association for Computational Linguistics: ACL 2024