Quality Estimation for Image Captions Based on Large-scale Human Evaluations

Levinboim, Tomer; Thapliyal, Ashish V.; Sharma, Piyush; Soricut, Radu

Computer Science > Computation and Language

arXiv:1909.03396 (cs)

[Submitted on 8 Sep 2019 (v1), last revised 1 Jun 2021 (this version, v2)]

Title:Quality Estimation for Image Captions Based on Large-scale Human Evaluations

Authors:Tomer Levinboim, Ashish V. Thapliyal, Piyush Sharma, Radu Soricut

View PDF

Abstract:Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality captions when used in the wild. In this paper, we focus on the task of Quality Estimation (QE) for image captions, which attempts to model the caption quality from a human perspective and without access to ground-truth references, so that it can be applied at prediction time to detect low-quality captions produced on previously unseen images. For this task, we develop a human evaluation process that collects coarse-grained caption annotations from crowdsourced users, which is then used to collect a large scale dataset spanning more than 600k caption quality ratings. We then carefully validate the quality of the collected ratings and establish baseline models for this new QE task. Finally, we further collect fine-grained caption quality annotations from trained raters, and use them to demonstrate that QE models trained over the coarse ratings can effectively detect and filter out low-quality image captions, thereby improving the user experience from captioning systems.

Comments:	10 pages, 6 figures, 3 tables. Accepted to NAACL2021. this https URL
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1909.03396 [cs.CL]
	(or arXiv:1909.03396v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1909.03396

Submission history

From: Tomer Levinboim [view email]
[v1] Sun, 8 Sep 2019 06:55:53 UTC (4,648 KB)
[v2] Tue, 1 Jun 2021 19:03:27 UTC (7,552 KB)

Computer Science > Computation and Language

Title:Quality Estimation for Image Captions Based on Large-scale Human Evaluations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Quality Estimation for Image Captions Based on Large-scale Human Evaluations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators