M

layoutlmv2-base-uncased

Otherby Microsoft·Model page

Microsoft's LayoutLMv2-base is a pretrained model for document understanding that jointly models text, layout, and visual features.

Share:

Model Card

Multimodal (text + layout/format + image) pre-training for document AI

The documentation of this model in the Transformers library can be found here.

Microsoft Document AI | GitHub

Introduction

LayoutLMv2 is an improved version of LayoutLM with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework. It outperforms strong baselines and achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks, including , including FUNSD (0.7895 → 0.8420), CORD (0.9493 → 0.9601), SROIE (0.9524 → 0.9781), Kleister-NDA (0.834 → 0.852), RVL-CDIP (0.9443 → 0.9564), and DocVQA (0.7295 → 0.8672).

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou, ACL 2021

Author
M
Microsoft
Organization · ✓
microsoft
Details
Downloads559.4K
Likes68
AccessOpen Source
Licensecc-by-nc-sa-4.0
Librarytransformers
CreatedMar 2, 2022
UpdatedSep 16, 2022
View on Hugging Face
Languages
en
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

layoutlmv2-base-uncased — AI Model Details | Applied