We introduce 🥥 CoCoNot, a resource for benchmarking and enhancing noncompliance behavior of large language models.
CoCoNot contains two components:
-
Original Set: For testing and improving contextual noncompliance in LMs.
- This set contains 1,001 evaluation and 11,477 SFT training examples.
-
Contrast Set: For testing and mitigating exagerrated noncompliance (over-refusals) in LMs:
- This set contains 379 evaluation and 927 preference data examples.
You can also view and download 🥥 CoCoNot on the 🤗 Huggingface Hub. And download them by:
from datasets import load_dataset
# load original test set
coconot_eval = load_dataset("allenai/coconot", "original", split="test")
# load contrast test set
coconot_contrast_eval = load_dataset("allenai/coconot", "contrast", split="test")
# load preference training set
coconot_train_pref = load_dataset("allenai/coconot", "pref", split="train")
You can find the seed prompts used for generating the data in prompts/ folder.
For evaluation, please first install open-instruct module which provides inference and finetuning code. Please follow the installation available in open-instruct.
Once open-instruct is installed, run the following command to evaluate a model (hf_model_name_or_path
):
bash open-instruct-predict-and-refusal-evaluate.sh ./data/coconot_eval.jsonl <hf_model_name_or_path> "prompt" "false" "refusal" "gpt-3.5-turbo"
You can replace gpt-3.5-turbo
with a different judge model such as gpt-4
.
Note that you can find our category-scpecific rubric for evaluating responses in here.
We will release our models checkpoints trained for noncompliance on huggingface soon!
We greatly thank Tulu team for providing the open-instruct codebase for inference and finetuning models.
If you find this work is relevant with your research, please cite us using:
@misc{brahman2024artsayingnocontextual,
title={The Art of Saying No: Contextual Noncompliance in Language Models},
author={Faeze Brahman and Sachin Kumar and Vidhisha Balachandran and Pradeep Dasigi and Valentina Pyatkin and Abhilasha Ravichander and Sarah Wiegreffe and Nouha Dziri and Khyathi Chandu and Jack Hessel and Yulia Tsvetkov and Noah A. Smith and Yejin Choi and Hannaneh Hajishirzi},
year={2024},
eprint={2407.12043},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.12043},
}