From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Huang, Kung-Hsiang; Chan, Hou Pong; Fung, Yi R.; Qiu, Haoyi; Zhou, Mingyang; Joty, Shafiq; Chang, Shih-Fu; Ji, Heng

Computer Science > Computation and Language

arXiv:2403.12027 (cs)

[Submitted on 18 Mar 2024 (v1), last revised 25 Mar 2024 (this version, v2)]

Title:From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Authors:Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji

View PDF HTML (experimental)

Abstract:Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in developing evaluation metrics, and agent-oriented settings. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis, providing valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: this https URL.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.12027 [cs.CL]
	(or arXiv:2403.12027v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.12027

Submission history

From: Kung-Hsiang Huang [view email]
[v1] Mon, 18 Mar 2024 17:57:09 UTC (415 KB)
[v2] Mon, 25 Mar 2024 17:39:10 UTC (454 KB)

Computer Science > Computation and Language

Title:From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators