Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

Trieu, Nicholas; Goodman, Sebastian; Narayana, Pradyumna; Sone, Kazoo; Soricut, Radu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2006.08686 (cs)

[Submitted on 15 Jun 2020]

Title:Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

Authors:Nicholas Trieu, Sebastian Goodman, Pradyumna Narayana, Kazoo Sone, Radu Soricut

View PDF

Abstract:Multi-sentence summarization is a well studied problem in NLP, while generating image descriptions for a single image is a well studied problem in Computer Vision. However, for applications such as image cluster labeling or web page summarization, summarizing a set of images is also a useful and challenging task. This paper proposes the new task of multi-image summarization, which aims to generate a concise and descriptive textual summary given a coherent set of input images. We propose a model that extends the image-captioning Transformer-based architecture for single image to multi-image. A dense average image feature aggregation network allows the model to focus on a coherent subset of attributes across the input images. We explore various input representations to the Transformer network and empirically show that aggregated image features are superior to individual image embeddings. We additionally show that the performance of the model is further improved by pretraining the model parameters on a single-image captioning task, which appears to be particularly effective in eliminating hallucinations in the output.

Comments:	9 pages, 5 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2006.08686 [cs.CV]
	(or arXiv:2006.08686v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2006.08686

Submission history

From: Nicholas Trieu [view email]
[v1] Mon, 15 Jun 2020 18:45:35 UTC (3,863 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2020-06

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Sebastian Goodman
Radu Soricut

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators