Skip to content

This repository contains data, code and models for contextual noncompliance.

License

Benachrichtigungen You must be signed in to change notification settings

allenai/noncompliance

Repository files navigation

The Art of Saying No: Contextual Noncompliance in Language Models

We introduce 🥥 CoCoNot, a resource for benchmarking and enhancing noncompliance behavior of large language models.

📄 Data

CoCoNot contains two components:

  • Original Set: For testing and improving contextual noncompliance in LMs.

    • This set contains 1,001 evaluation and 11,477 SFT training examples.
  • Contrast Set: For testing and mitigating exagerrated noncompliance (over-refusals) in LMs:

    • This set contains 379 evaluation and 927 preference data examples.

You can also view and download 🥥 CoCoNot on the 🤗 Huggingface Hub. And download them by:

from datasets import load_dataset


# load original test set
coconot_eval = load_dataset("allenai/coconot", "original", split="test")

# load contrast test set
coconot_contrast_eval = load_dataset("allenai/coconot", "contrast", split="test")

# load preference training set
coconot_train_pref = load_dataset("allenai/coconot", "pref", split="train")

Seed Prompts

You can find the seed prompts used for generating the data in prompts/ folder.

📦 Installing Packages

For evaluation, please first install open-instruct module which provides inference and finetuning code. Please follow the installation available in open-instruct.

📊 Evaluation

Once open-instruct is installed, run the following command to evaluate a model (hf_model_name_or_path):

bash open-instruct-predict-and-refusal-evaluate.sh ./data/coconot_eval.jsonl <hf_model_name_or_path> "prompt" "false" "refusal" "gpt-3.5-turbo"

You can replace gpt-3.5-turbo with a different judge model such as gpt-4.

Note that you can find our category-scpecific rubric for evaluating responses in here.

🚀 Models

We will release our models checkpoints trained for noncompliance on huggingface soon!

Acknowledgement

We greatly thank Tulu team for providing the open-instruct codebase for inference and finetuning models.

Citation

If you find this work is relevant with your research, please cite us using:

@misc{brahman2024artsayingnocontextual,
      title={The Art of Saying No: Contextual Noncompliance in Language Models}, 
      author={Faeze Brahman and Sachin Kumar and Vidhisha Balachandran and Pradeep Dasigi and Valentina Pyatkin and Abhilasha Ravichander and Sarah Wiegreffe and Nouha Dziri and Khyathi Chandu and Jack Hessel and Yulia Tsvetkov and Noah A. Smith and Yejin Choi and Hannaneh Hajishirzi},
      year={2024},
      eprint={2407.12043},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.12043}, 
}

Über uns

This repository contains data, code and models for contextual noncompliance.

Ressourcen

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published