GitHub - allenai/noncompliance: This repository contains data, code and models for contextual noncompliance.

The Art of Saying No: Contextual Noncompliance in Language Models

We introduce 🥥 CoCoNot, a resource for benchmarking and enhancing noncompliance behavior of large language models.

📄 Data

CoCoNot contains two components:

Original Set: For testing and improving contextual noncompliance in LMs.
- This set contains 1,001 evaluation and 11,477 SFT training examples.
Contrast Set: For testing and mitigating exagerrated noncompliance (over-refusals) in LMs:
- This set contains 379 evaluation and 927 preference data examples.

You can also view and download 🥥 CoCoNot on the 🤗 Huggingface Hub. And download them by:

from datasets import load_dataset


# load original test set
coconot_eval = load_dataset("allenai/coconot", "original", split="test")

# load contrast test set
coconot_contrast_eval = load_dataset("allenai/coconot", "contrast", split="test")

# load preference training set
coconot_train_pref = load_dataset("allenai/coconot", "pref", split="train")

Seed Prompts

You can find the seed prompts used for generating the data in prompts/ folder.

📦 Installing Packages

For evaluation, please first install open-instruct module which provides inference and finetuning code. Please follow the installation available in open-instruct.

📊 Evaluation

Once open-instruct is installed, run the following command to evaluate a model (hf_model_name_or_path):

bash open-instruct-predict-and-refusal-evaluate.sh ./data/coconot_eval.jsonl <hf_model_name_or_path> "prompt" "false" "refusal" "gpt-3.5-turbo"

You can replace gpt-3.5-turbo with a different judge model such as gpt-4.

Note that you can find our category-scpecific rubric for evaluating responses in here.

🚀 Models

We will release our models checkpoints trained for noncompliance on huggingface soon!

Acknowledgement

We greatly thank Tulu team for providing the open-instruct codebase for inference and finetuning models.

Citation

If you find this work is relevant with your research, please cite us using:

@misc{brahman2024artsayingnocontextual,
      title={The Art of Saying No: Contextual Noncompliance in Language Models}, 
      author={Faeze Brahman and Sachin Kumar and Vidhisha Balachandran and Pradeep Dasigi and Valentina Pyatkin and Abhilasha Ravichander and Sarah Wiegreffe and Nouha Dziri and Khyathi Chandu and Jack Hessel and Yulia Tsvetkov and Noah A. Smith and Yejin Choi and Hannaneh Hajishirzi},
      year={2024},
      eprint={2407.12043},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.12043}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
eval		eval
prompts		prompts
src		src
LICENSE		LICENSE
README.md		README.md
open-instruct-predict-and-refusal-evaluate.sh		open-instruct-predict-and-refusal-evaluate.sh
paper.pdf		paper.pdf
requirements.txt		requirements.txt
taxonomy_figure_hf.png		taxonomy_figure_hf.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Art of Saying No: Contextual Noncompliance in Language Models

📄 Data

Seed Prompts

📦 Installing Packages

📊 Evaluation

🚀 Models

Acknowledgement

Citation

Über uns

Releases

Packages

Languages

License

allenai/noncompliance

Folders and files

Latest commit

History

Repository files navigation

The Art of Saying No: Contextual Noncompliance in Language Models

📄 Data

Seed Prompts

📦 Installing Packages

📊 Evaluation

🚀 Models

Acknowledgement

Citation

Über uns

Ressourcen

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages