Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Luo, Siwen; Ding, Yihao; Long, Siqu; Poon, Josiah; Han, Soyeon Caren

Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.10970 (cs)

[Submitted on 22 Aug 2022 (v1), last revised 19 Sep 2022 (this version, v2)]

Title:Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Authors:Siwen Luo, Yihao Ding, Siqu Long, Josiah Poon, Soyeon Caren Han

View PDF

Abstract:Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on computer vision models to understand documents while ignoring other information, such as context information or relation of document components, which are vital to capture. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information. Then, we apply graph convolutional networks for representing each aspect of information and use pooling to integrate them. Finally, we aggregate each aspect and feed them into 2-layer MLPs for document layout component classification. Our Doc-GCN achieves new state-of-the-art results in three widely used DLA datasets.

Comments:	Accepted by COLING 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2208.10970 [cs.CV]
	(or arXiv:2208.10970v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.10970

Submission history

From: Yihao Ding [view email]
[v1] Mon, 22 Aug 2022 07:22:05 UTC (9,098 KB)
[v2] Mon, 19 Sep 2022 05:59:40 UTC (9,099 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators