OCR 无文件理解转换器 (OCR-free Document Understanding Transformer) - 专知论文

会员服务 ·

0

OCR · 可理解性 · SimPLe · MoDELS · Extensibility ·

2022 年 8 月 23 日

OCR-free Document Understanding Transformer

翻译：OCR 无文件理解转换器

Geewook Kim,Teakgyu Hong,Moonbin Yim,Jeongyeon Nam,Jinyoung Park,Jinyeong Yim,Wonseok Hwang,Sangdoo Yun,Dongyoon Han,Seunghyun Park

from arxiv, ECCV 2022. (v3) update table 2; add LayoutLM and update scores with the latest test script at https://github.com/clovaai/donut

Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains. The code, trained model and synthetic data are available at https://github.com/clovaai/donut.

翻译：理解文件图像(例如,发票)是一项核心但具有挑战性的任务,因为它需要复杂的功能,例如阅读文本和全面理解文件。当前视觉文件理解(VDU)的方法将阅读文本的任务外包给现成的光学字符识别引擎,并侧重于与OCR产出有关的理解任务。虽然这种以OCR为基础的方法表现良好,但它们面临着以下困难:(1) 使用OCR的高计算成本;(2) OCR模式在语言或文件类型上的灵活性;(3) OCR错误传播到随后的进程。为了解决这些问题,我们在本文件中采用了名为Donuut的无OCRVDU新式模型。作为无OCR的VDU研究的第一步,我们提出了一个简单的结构(即变异器),其培训前的目标(即交叉损失)是高额计算成本。通过广泛的实验和分析,我们展示了简单的OCR-free VDU模型、Donut、实现文件理解变异格式的状态-艺术性能测试,在各种模型/变异形式上,我们提供各种数据格式的进度。

0

相关内容

OCR

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

MHC-B单倍型鸡MD抗性相关miRNA的鉴定及功能靶基因研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向机器翻译的多词表达语义分析及应用研究

国家自然科学基金

1+阅读 · 2014年12月31日

资源勘探中的反问题的数学理论与算法

国家自然科学基金

0+阅读 · 2013年12月31日

脂肪酸代谢调节重编程的研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

机械式自动变速器的滚动优化控制

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

Riemann-Hilbert方法及若干相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

玉米幼苗干旱胁迫应答NAC转录因子基因的筛选和鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

过渡金属催化卤代芳烃对芳醛的Barbier类型反应研究

国家自然科学基金

0+阅读 · 2009年12月31日

Compressed Vision for Efficient Video Understanding

Arxiv

0+阅读 · 2022年10月6日

XDoc: Unified Pre-training for Cross-Format Document Understanding

Arxiv

0+阅读 · 2022年10月6日

Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering

Arxiv

0+阅读 · 2022年10月4日

One Transformer Can Understand Both 2D & 3D Molecular Data

Arxiv

0+阅读 · 2022年10月4日

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Arxiv

0+阅读 · 2022年10月2日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

机器人领域中最佳的三维场景表示是什么？——从几何表示到基础模型

《多域作战兵棋推演：运用形态学分析与人工智能加强国防人员训练》

【博士论文】快速高效的归一化流及其在图像生成模型中的应用

仿生机器人技术的军事应用

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

相关论文

Compressed Vision for Efficient Video Understanding

Arxiv

0+阅读 · 2022年10月6日

XDoc: Unified Pre-training for Cross-Format Document Understanding

Arxiv

0+阅读 · 2022年10月6日

Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering

Arxiv

0+阅读 · 2022年10月4日

One Transformer Can Understand Both 2D & 3D Molecular Data

Arxiv

0+阅读 · 2022年10月4日

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Arxiv

0+阅读 · 2022年10月2日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

相关基金

MHC-B单倍型鸡MD抗性相关miRNA的鉴定及功能靶基因研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向机器翻译的多词表达语义分析及应用研究

国家自然科学基金

1+阅读 · 2014年12月31日

资源勘探中的反问题的数学理论与算法

国家自然科学基金

0+阅读 · 2013年12月31日

脂肪酸代谢调节重编程的研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

机械式自动变速器的滚动优化控制

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

Riemann-Hilbert方法及若干相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

玉米幼苗干旱胁迫应答NAC转录因子基因的筛选和鉴定

国家自然科学基金

0+阅读 · 2012年12月31日

过渡金属催化卤代芳烃对芳醛的Barbier类型反应研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员