A Closer Look into Automatic Evaluation Using Large Language Models

Chiang, Cheng-Han; Lee, Hung-yi

Computer Science > Computation and Language

arXiv:2310.05657 (cs)

[Submitted on 9 Oct 2023]

Title:A Closer Look into Automatic Evaluation Using Large Language Models

Authors:Cheng-Han Chiang, Hung-yi Lee

View PDF

Abstract:Using large language models (LLMs) to evaluate text quality has recently gained popularity. Some prior works explore the idea of using LLMs for evaluation, while they differ in some details of the evaluation process. In this paper, we analyze LLM evaluation (Chiang and Lee, 2023) and G-Eval (Liu et al., 2023), and we discuss how those details in the evaluation process change how well the ratings given by LLMs correlate with human ratings. We find that the auto Chain-of-Thought (CoT) used in G-Eval does not always make G-Eval more aligned with human ratings. We also show that forcing the LLM to output only a numeric rating, as in G-Eval, is suboptimal. Last, we reveal that asking the LLM to explain its own ratings consistently improves the correlation between the ChatGPT and human ratings and pushes state-of-the-art (SoTA) correlations on two meta-evaluation datasets.

Comments:	EMNLP 2023 findings (short paper). Code: this https URL
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.05657 [cs.CL]
	(or arXiv:2310.05657v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.05657

Submission history

From: Cheng-Han Chiang [view email]
[v1] Mon, 9 Oct 2023 12:12:55 UTC (40 KB)

Computer Science > Computation and Language

Title:A Closer Look into Automatic Evaluation Using Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Closer Look into Automatic Evaluation Using Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators