Layoutlm: Pre-training of text and layout for document image understanding

dc.contributor.authorLei Cui
dc.contributor.authorsecond author
dc.date.accessioned2025-01-07T08:24:35Z
dc.date.issued2020-08-20
dc.descriptionPre-training techniques have shown success in NLP tasks, but often overlook layout and style critical for document image understanding. LayoutLM addresses this gap by integrating text and layout interactions in scanned documents, enabling enhanced real-world document analysis.
dc.description.abstractPre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words’ visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for documentlevel pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at https://aka.ms/layoutlm
dc.identifier.citationXu, Y., Li, M., Cui, L., Huang, S., Wei, F. and Zhou, M., 2020, August. Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1192-1200).
dc.identifier.urihttps://arxiv.org/pdf/1912.13318
dc.identifier.urihttps://ds.uofallujah.edu.iq/handle/123456789/52
dc.language.isoen_US
dc.publisherACM
dc.titleLayoutlm: Pre-training of text and layout for document image understanding
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
339.webp
Size:
4.21 KB
Format:
WebP is a modern image format that provides superior lossless and lossy compression for images on the web.

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
211 B
Format:
Item-specific license agreed to upon submission
Description:

Collections