Benchmarking Complex Instruction-Following with Multiple Constraints Composition

Wen, Bosi; Ke, Pei; Gu, Xiaotao; Wu, Lindong; Huang, Hao; Zhou, Jinfeng; Li, Wenchuang; Hu, Binxin; Gao, Wendy; Xu, Jiaxin; Liu, Yiming; Tang, Jie; Wang, Hongning; Huang, Minlie

Computer Science > Computation and Language

arXiv:2407.03978v1 (cs)

[Submitted on 4 Jul 2024 (this version), latest version 11 Jul 2024 (v2)]

Title:Benchmarking Complex Instruction-Following with Multiple Constraints Composition

Authors:Bosi Wen, Pei Ke, Xiaotao Gu, Lindong Wu, Hao Huang, Jinfeng Zhou, Wenchuang Li, Binxin Hu, Wendy Gao, Jiaxin Xu, Yiming Liu, Jie Tang, Hongning Wang, Minlie Huang

View PDF HTML (experimental)

Abstract:Instruction following is one of the fundamental capabilities of large language models (LLMs). As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on modeling different types of constraints in human instructions while neglecting the composition of different constraints, which is an indispensable constituent in complex instructions. To this end, we propose ComplexBench, a benchmark for comprehensively evaluating the ability of LLMs to follow complex instructions composed of multiple constraints. We propose a hierarchical taxonomy for complex instructions, including 4 constraint types, 19 constraint dimensions, and 4 composition types, and manually collect a high-quality dataset accordingly. To make the evaluation reliable, we augment LLM-based evaluators with rules to effectively verify whether generated texts can satisfy each constraint and composition. Furthermore, we obtain the final evaluation score based on the dependency structure determined by different composition types. ComplexBench identifies significant deficiencies in existing LLMs when dealing with complex instructions with multiple constraints composition.

Comments:	20 pages, 7 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.03978 [cs.CL]
	(or arXiv:2407.03978v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.03978

Submission history

From: Bosi Wen [view email]
[v1] Thu, 4 Jul 2024 14:50:45 UTC (6,198 KB)
[v2] Thu, 11 Jul 2024 06:44:47 UTC (6,199 KB)

Computer Science > Computation and Language

Title:Benchmarking Complex Instruction-Following with Multiple Constraints Composition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Benchmarking Complex Instruction-Following with Multiple Constraints Composition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators