Re-Examining Calibration: The Case of Question Answering

Si, Chenglei; Zhao, Chen; Min, Sewon; Boyd-Graber, Jordan

Computer Science > Computation and Language

arXiv:2205.12507 (cs)

[Submitted on 25 May 2022 (v1), last revised 24 Oct 2022 (this version, v2)]

Title:Re-Examining Calibration: The Case of Question Answering

Authors:Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber

View PDF

Abstract:For users to trust model predictions, they need to understand model outputs, particularly their confidence - calibration aims to adjust (calibrate) models' confidence to match expected accuracy. We argue that the traditional calibration evaluation does not promote effective calibrations: for example, it can encourage always assigning a mediocre confidence score to all predictions, which does not help users distinguish correct predictions from wrong ones. Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions. Focusing on the practical application of open-domain question answering, we examine conventional calibration methods applied on the widely-used retriever-reader pipeline, all of which do not bring significant gains under our new MacroCE metric. Toward better calibration, we propose a new calibration method (ConsCal) that uses not just final model predictions but whether multiple model checkpoints make consistent predictions. Altogether, we provide an alternative view of calibration along with a new metric, re-evaluation of existing calibration methods on our metric, and proposal of a more effective calibration method.

Comments:	EMNLP 2022 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2205.12507 [cs.CL]
	(or arXiv:2205.12507v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.12507

Submission history

From: Chenglei Si [view email]
[v1] Wed, 25 May 2022 05:49:56 UTC (612 KB)
[v2] Mon, 24 Oct 2022 00:57:49 UTC (1,715 KB)

Computer Science > Computation and Language

Title:Re-Examining Calibration: The Case of Question Answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Re-Examining Calibration: The Case of Question Answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators