Google Scholar

Automatic Recognition of Learning Resource Category in a Digital Library

S Banerjee, DK Sanyal… - 2021 ACM/IEEE …, 2021 - ieeexplore.ieee.org

S Banerjee, DK Sanyal, S Chattopadhyay, PK Bhowmick, PP Das

2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021•ieeexplore.ieee.org

Digital libraries generally need to process a large volume of diverse document types. The
collection and tagging of metadata is a long, error-prone, workforce-consuming task. We are
attempting to build an automatic metadata extractor for digital libraries. In this work, we
present the Heterogeneous Learning Resources (HLR) dataset for document image
classification. The individual learning resource is first decomposed into its constituent
document images (sheets) which are then passed through an OCR tool to obtain the textual …

Digital libraries generally need to process a large volume of diverse document types. The collection and tagging of metadata is a long, error-prone, workforce-consuming task. We are attempting to build an automatic metadata extractor for digital libraries. In this work, we present the Heterogeneous Learning Resources (HLR) dataset for document image classification. The individual learning resource is first decomposed into its constituent document images (sheets) which are then passed through an OCR tool to obtain the textual representation. The document image and its textual content are classified with state-of-the-art classifiers. Finally, the labels of the constituent document images are used to predict the label of the overall document.

ieeexplore.ieee.org

Show moreShow less

Speichern Sie Cite Related articles All 4 versions

Cite

Advanced search

Saved to My library

Automatic Recognition of Learning Resource Category in a Digital Library