OCR 无文件理解转换器 (OCR-free Document Understanding Transformer) - 专知论文

会员服务 ·

0

可理解性 · OCR · SimPLe · MoDELS · 变换 ·

2022 年 10 月 4 日

OCR-free Document Understanding Transformer

翻译：OCR 无文件理解转换器

Geewook Kim,Teakgyu Hong,Moonbin Yim,Jeongyeon Nam,Jinyoung Park,Jinyeong Yim,Wonseok Hwang,Sangdoo Yun,Dongyoon Han,Seunghyun Park

from arxiv, ECCV 2022. (v4) update table 2 and figures; add LayoutLM and update scores with the latest test script at https://github.com/clovaai/donut

Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains. The code, trained model and synthetic data are available at https://github.com/clovaai/donut.

翻译：理解文件图像(例如,发票)是一项核心但具有挑战性的任务,因为它需要复杂的功能,例如阅读文本和全面理解文件。当前视觉文件理解(VDU)的方法将阅读文本的任务外包给现成的光学字符识别引擎,并侧重于与OCR产出有关的理解任务。虽然这种以OCR为基础的方法表现良好,但它们面临着以下困难:(1) 使用OCR的高计算成本;(2) OCR模式在语言或文件类型上的灵活性;(3) OCR错误传播到随后的进程。为了解决这些问题,我们在本文件中采用了名为Donuut的无OCRVDU新式模型。作为无OCR的VDU研究的第一步,我们提出了一个简单的结构(即变异器),其培训前的目标(即交叉损失)是高额计算成本。通过广泛的实验和分析,我们展示了简单的OCR-free VDU模型、Donut、实现文件理解变异格式的状态-艺术性能测试,在各种模型/变异形式上,我们提供各种数据格式的进度。

0

相关内容

可理解性

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

随机波动率模型的统计推断及数值解

国家自然科学基金

1+阅读 · 2015年12月31日

基于Kernel算子的仿射非线性系统故障诊断与容错控制研究及应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于氧化铝、氧化钛纳米有序微孔阵列研究结构诱导表面润湿转变的物理机制

国家自然科学基金

0+阅读 · 2014年12月31日

解析函数的分形边界性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

带有随机效应的广义空间自回归模型的统计推断

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

芹菜AP2/ERF家族转录因子表达调控及复制进化研究

国家自然科学基金

0+阅读 · 2012年12月31日

花斑木形成机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

微量Zr、Mg等在Cu-Cr-Zr铜合金时效过程中的作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

智能数据驱动的复杂工业流程的故障诊断与分析

国家自然科学基金

0+阅读 · 2009年12月31日

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

Arxiv

0+阅读 · 2022年11月9日

DORE: Document Ordered Relation Extraction based on Generative Framework

DORE: Document Ordered Relation Extraction based on Generative Framework

Arxiv

0+阅读 · 2022年11月9日

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Arxiv

0+阅读 · 2022年11月9日

Understanding The Robustness in Vision Transformers

Arxiv

0+阅读 · 2022年11月8日

Cold Diffusion for Speech Enhancement

Arxiv

0+阅读 · 2022年11月4日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

A Survey of Transformers

Arxiv

103+阅读 · 2021年6月8日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

联邦图学习的全面数据中心化综述

基于脉冲神经网络的边缘智能

LaCache：用于高效长上下文建模的大语言模型梯状KV缓存机制

【CMU博士论文】可解释的图与时间序列挖掘：算法与应用

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

Arxiv

0+阅读 · 2022年11月9日

DORE: Document Ordered Relation Extraction based on Generative Framework

DORE: Document Ordered Relation Extraction based on Generative Framework

Arxiv

0+阅读 · 2022年11月9日

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Arxiv

0+阅读 · 2022年11月9日

Understanding The Robustness in Vision Transformers

Arxiv

0+阅读 · 2022年11月8日

Cold Diffusion for Speech Enhancement

Arxiv

0+阅读 · 2022年11月4日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

A Survey of Transformers

Arxiv

103+阅读 · 2021年6月8日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

相关基金

随机波动率模型的统计推断及数值解

国家自然科学基金

1+阅读 · 2015年12月31日

基于Kernel算子的仿射非线性系统故障诊断与容错控制研究及应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于氧化铝、氧化钛纳米有序微孔阵列研究结构诱导表面润湿转变的物理机制

国家自然科学基金

0+阅读 · 2014年12月31日

解析函数的分形边界性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

带有随机效应的广义空间自回归模型的统计推断

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

芹菜AP2/ERF家族转录因子表达调控及复制进化研究

国家自然科学基金

0+阅读 · 2012年12月31日

花斑木形成机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

微量Zr、Mg等在Cu-Cr-Zr铜合金时效过程中的作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

智能数据驱动的复杂工业流程的故障诊断与分析

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员