OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Wan, Jianqiang; Song, Sibo; Yu, Wenwen; Liu, Yuliang; Cheng, Wenqing; Huang, Fei; Bai, Xiang; Yao, Cong; Yang, Zhibo

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.19128 (cs)

[Submitted on 28 Mar 2024]

Title:OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Authors:Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

View PDF HTML (experimental)

Abstract:Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions. Various methods have been proposed to address the challenging problem of VsTP. However, due to the diversified targets and heterogeneous schemas, previous works usually design task-specific architectures and objectives for individual tasks, which inadvertently leads to modal isolation and complex workflow. In this paper, we propose a unified paradigm for parsing visually-situated text across diverse scenarios. Specifically, we devise a universal model, called OmniParser, which can simultaneously handle three typical visually-situated text parsing tasks: text spotting, key information extraction, and table recognition. In OmniParser, all tasks share the unified encoder-decoder architecture, the unified objective: point-conditioned text generation, and the unified input & output representation: prompt & structured sequences. Extensive experiments demonstrate that the proposed OmniParser achieves state-of-the-art (SOTA) or highly competitive performances on 7 datasets for the three visually-situated text parsing tasks, despite its unified, concise design. The code is available at this https URL.

Comments:	CVPR 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.19128 [cs.CV]
	(or arXiv:2403.19128v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.19128

Submission history

From: Jianqiang Wan [view email]
[v1] Thu, 28 Mar 2024 03:51:14 UTC (14,948 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators