【KDD2020】通用文档预训练模型LayoutLM：文档结构信息和视觉信息进行建模，让模型在预训练阶段进行多模态对齐。 - 专知

会员服务 ·

0

【KDD2020】通用文档预训练模型LayoutLM：文档结构信息和视觉信息进行建模，让模型在预训练阶段进行多模态对齐。

2020 年 8 月 23 日 专知

大量的研究成果表明，大规模预训练语言模型通过自监督任务，可在预训练阶段有效捕捉文本中蕴含的语义信息，经过下游任务微调后能有效的提升模型效果。然而，现有的预训练语言模型主要针对文本单一模态进行，忽视了文档本身与文本天然对齐的视觉结构信息。为了解决这一问题，研究员们提出了一种通用文档预训练模型LayoutLM[1][2]，选择了文档结构信息（Document Layout Information）和视觉信息（Visual Information）进行建模，让模型在预训练阶段进行多模态对齐。

在实际使用的过程中，LayoutLM 仅需要极少的标注数据即可达到行业领先的水平。研究员们在三个不同类型的下游任务中进行了验证：表单理解（Form Understanding）、票据理解（Receipt Understanding），以及文档图像分类（Document Image Classification）。实验结果表明，在预训练中引入的结构和视觉信息，能够有效地迁移到下游任务中，最终在三个下游任务中都取得了显著的准确率提升。

https://www.zhuanzhi.ai/paper/d936ea435305d4f5a4835461799ea355

专知便捷查看

便捷下载，请关注专知公众号（点击上方蓝色专知关注）

后台回复“LALM” 可以获取《【KDD2020】通用文档预训练模型LayoutLM：文档结构信息和视觉信息进行建模，让模型在预训练阶段进行多模态对齐。》专知下载链接索引

专知，专业可信的人工智能知识分发，让认知协作更快更好！欢迎注册登录专知www.zhuanzhi.ai，获取5000+AI主题干货知识资料！

欢迎微信扫一扫加入专知人工智能知识星球群，获取最新AI专业干货知识教程资料和与专家交流咨询！

点击“ 阅读原文 ”，了解使用专知 ，查看获取5000+AI主题知识资源

登录查看更多

1

相关内容

LayoutLM

【EMNLP 2020】融合自训练和自监督方法的无监督文本顺滑研究

【EMNLP 2020】融合自训练和自监督方法的无监督文本顺滑研究

专知会员服务

24+阅读 · 2020年10月18日

【EMNLP2020】低资源域适应的多阶段预训练

专知会员服务

19+阅读 · 2020年10月13日

【KDD2020-UCLA-微软】GPT-GNN：图神经网络的预训练

【KDD2020-UCLA-微软】GPT-GNN：图神经网络的预训练

专知会员服务

63+阅读 · 2020年8月19日

【ACMMM2020-北航】KBGN:用于视觉对话中自适应视觉-文本推理的知识桥图网络

【ACMMM2020-北航】KBGN:用于视觉对话中自适应视觉-文本推理的知识桥图网络

专知会员服务

22+阅读 · 2020年8月12日

【KDD2020】自适应多通道图卷积神经网络

【KDD2020】自适应多通道图卷积神经网络

专知会员服务

121+阅读 · 2020年7月9日

哈工大最新综述，基于文档的对话系统，30页pdf跟踪最新领域前沿

哈工大最新综述，基于文档的对话系统，30页pdf跟踪最新领域前沿

专知会员服务

91+阅读 · 2020年5月8日

【CVPR2020-中科院计算所】多模态GNN：在视觉信息和场景文字上联合推理

【CVPR2020-中科院计算所】多模态GNN：在视觉信息和场景文字上联合推理

专知会员服务

61+阅读 · 2020年4月7日

CVPR 2020 | MetaFuse：用于人体姿态估计的预训练信息融合模型

CVPR 2020 | MetaFuse：用于人体姿态估计的预训练信息融合模型

专知会员服务

25+阅读 · 2020年4月2日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

专知会员服务

45+阅读 · 2020年2月12日

【KDD2020-阿里】可调控的多兴趣推荐框架

【KDD2020-阿里】可调控的多兴趣推荐框架

专知

9+阅读 · 2020年8月11日

【KDD2020-清华大学】属性图嵌入的自适应图编码器

【KDD2020-清华大学】属性图嵌入的自适应图编码器

专知

30+阅读 · 2020年7月8日

【KDD2020】图神经网络生成式预训练

【KDD2020】图神经网络生成式预训练

专知

22+阅读 · 2020年7月3日

【WWW2020-清华大学】图增强表示学习的新闻推荐,Graph Enhanced RepresentationLearning

【WWW2020-清华大学】图增强表示学习的新闻推荐,Graph Enhanced RepresentationLearning

专知

34+阅读 · 2020年4月4日

【Amazon】使用预训练Transformer模型进行数据增强

【Amazon】使用预训练Transformer模型进行数据增强

专知

12+阅读 · 2020年3月6日

【北京大学】探索提取跨模态信息进行图像caption，Distilling Cross-Modal Information

【北京大学】探索提取跨模态信息进行图像caption，Distilling Cross-Modal Information

专知

7+阅读 · 2020年3月5日

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

专知

12+阅读 · 2020年2月12日

【资源】NLP领域图神经网络(GNN) 应用相关论文列表

【资源】NLP领域图神经网络(GNN) 应用相关论文列表

专知

39+阅读 · 2019年10月22日

【报告分享】图神经网络在自然语言处理领域的前沿应用

【报告分享】图神经网络在自然语言处理领域的前沿应用

专知

47+阅读 · 2019年10月22日

哈工大刘挺教授：中文信息处理前沿技术进展

哈工大刘挺教授：中文信息处理前沿技术进展

专知

8+阅读 · 2018年11月8日

Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora

Arxiv

0+阅读 · 2020年10月13日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Constructing a Visual Relationship Authenticity Dataset

Arxiv

0+阅读 · 2020年10月11日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

Arxiv

6+阅读 · 2019年5月21日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Arxiv

10+阅读 · 2018年8月29日

Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"

Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"

Arxiv

4+阅读 · 2018年5月21日

On the loss of Fisher information in some multi-object tracking observation models

Arxiv

3+阅读 · 2018年3月26日

VIP会员

相关主题

预训练模型

相关VIP内容

【EMNLP 2020】融合自训练和自监督方法的无监督文本顺滑研究

【EMNLP 2020】融合自训练和自监督方法的无监督文本顺滑研究

专知会员服务

24+阅读 · 2020年10月18日

【EMNLP2020】低资源域适应的多阶段预训练

专知会员服务

19+阅读 · 2020年10月13日

【KDD2020-UCLA-微软】GPT-GNN：图神经网络的预训练

【KDD2020-UCLA-微软】GPT-GNN：图神经网络的预训练

专知会员服务

63+阅读 · 2020年8月19日

【ACMMM2020-北航】KBGN:用于视觉对话中自适应视觉-文本推理的知识桥图网络

【ACMMM2020-北航】KBGN:用于视觉对话中自适应视觉-文本推理的知识桥图网络

专知会员服务

22+阅读 · 2020年8月12日

【KDD2020】自适应多通道图卷积神经网络

【KDD2020】自适应多通道图卷积神经网络

专知会员服务

121+阅读 · 2020年7月9日

哈工大最新综述，基于文档的对话系统，30页pdf跟踪最新领域前沿

哈工大最新综述，基于文档的对话系统，30页pdf跟踪最新领域前沿

专知会员服务

91+阅读 · 2020年5月8日

【CVPR2020-中科院计算所】多模态GNN：在视觉信息和场景文字上联合推理

【CVPR2020-中科院计算所】多模态GNN：在视觉信息和场景文字上联合推理

专知会员服务

61+阅读 · 2020年4月7日

CVPR 2020 | MetaFuse：用于人体姿态估计的预训练信息融合模型

CVPR 2020 | MetaFuse：用于人体姿态估计的预训练信息融合模型

专知会员服务

25+阅读 · 2020年4月2日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

专知会员服务

45+阅读 · 2020年2月12日

热门VIP内容

开通专知VIP会员享更多权益服务

不确定环境下无人机三维路径规划研究 | 221页

远征作战军事后勤规划

大语言模型将如何改变军事指挥结构

美陆军能力集成与开发系统（ACIDS）流程指南 | 2025最新122页

相关资讯

【KDD2020-阿里】可调控的多兴趣推荐框架

【KDD2020-阿里】可调控的多兴趣推荐框架

专知

9+阅读 · 2020年8月11日

【KDD2020-清华大学】属性图嵌入的自适应图编码器

【KDD2020-清华大学】属性图嵌入的自适应图编码器

专知

30+阅读 · 2020年7月8日

【KDD2020】图神经网络生成式预训练

【KDD2020】图神经网络生成式预训练

专知

22+阅读 · 2020年7月3日

【WWW2020-清华大学】图增强表示学习的新闻推荐,Graph Enhanced RepresentationLearning

【WWW2020-清华大学】图增强表示学习的新闻推荐,Graph Enhanced RepresentationLearning

专知

34+阅读 · 2020年4月4日

【Amazon】使用预训练Transformer模型进行数据增强

【Amazon】使用预训练Transformer模型进行数据增强

专知

12+阅读 · 2020年3月6日

【北京大学】探索提取跨模态信息进行图像caption，Distilling Cross-Modal Information

【北京大学】探索提取跨模态信息进行图像caption，Distilling Cross-Modal Information

专知

7+阅读 · 2020年3月5日

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

【Google AI新论文】REALM:检索增强语言模型预训练，QA的SOTA提升4-16%准确性

专知

12+阅读 · 2020年2月12日

【资源】NLP领域图神经网络(GNN) 应用相关论文列表

【资源】NLP领域图神经网络(GNN) 应用相关论文列表

专知

39+阅读 · 2019年10月22日

【报告分享】图神经网络在自然语言处理领域的前沿应用

【报告分享】图神经网络在自然语言处理领域的前沿应用

专知

47+阅读 · 2019年10月22日

哈工大刘挺教授：中文信息处理前沿技术进展

哈工大刘挺教授：中文信息处理前沿技术进展

专知

8+阅读 · 2018年11月8日

相关论文

Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora

Arxiv

0+阅读 · 2020年10月13日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Constructing a Visual Relationship Authenticity Dataset

Arxiv

0+阅读 · 2020年10月11日

All Word Embeddings from One Embedding

Arxiv

4+阅读 · 2020年5月25日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

Arxiv

6+阅读 · 2019年5月21日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Arxiv

10+阅读 · 2018年8月29日

Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"

Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"

Arxiv

4+阅读 · 2018年5月21日

On the loss of Fisher information in some multi-object tracking observation models

Arxiv

3+阅读 · 2018年3月26日

大家都在搜

2025最新文献

NTU博士论文

无人机测控通信自组网技术综述

微信扫码咨询专知VIP会员