DAN:一个无分割文件文件注意网络,以确认手写文件的识别 (DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition) - 专知论文

会员服务 ·

0

文档识别 · Attention · MoDELS · Networking · 标注 ·

2022 年 8 月 1 日

DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

翻译：DAN:一个无分割文件文件注意网络,以确认手写文件的识别

Denis Coquenet,Clément Chatelain,Thierry Paquet

Unconstrained handwritten text recognition is a challenging computer vision task. It is traditionally handled by a two-step approach, combining line segmentation followed by text line recognition. For the first time, we propose an end-to-end segmentation-free architecture for the task of handwritten document recognition: the Document Attention Network. In addition to text recognition, the model is trained to label text parts using begin and end tags in an XML-like fashion. This model is made up of an FCN encoder for feature extraction and a stack of transformer decoder layers for a recurrent token-by-token prediction process. It takes whole text documents as input and sequentially outputs characters, as well as logical layout tokens. Contrary to the existing segmentation-based approaches, the model is trained without using any segmentation label. We achieve competitive results on the READ 2016 dataset at page level, as well as double-page level with a CER of 3.43% and 3.70%, respectively. We also provide results for the RIMES 2009 dataset at page level, reaching 4.54% of CER. We provide all source code and pre-trained model weights at https://github.com/FactoDeepLearning/DAN.

翻译：不受限制的手写文本识别是一项具有挑战性的计算机愿景任务。它传统上由两步方法处理, 结合线条分割, 并辅之以文字线辨识。我们第一次为手写文件识别任务建议一个不端到端的分解结构: 文件注意网络。除了文本识别, 模型还受过培训, 使用类似XML的起始和结尾标签对文本部分进行标签。这个模型由用于特征提取的FCN编码器组成, 以及一组变压器解码层组成, 用于经常性的逐字式预测进程。它将整个文本文档作为输入和顺序输出字符, 以及逻辑布局符号。与现有的分解法不同, 该模型在不使用任何分解标签的情况下受到培训。我们在READ 2016 数据集页面层次上取得了竞争性结果, 以及双页级别, CER 分别为3.43% 和3. 70% 。我们还为页面级别的 RIMES 2009 数据集提供了结果, 达到CER 4.54 % 。我们提供所有源代码, 和在 http://FADAR/ agrestrain 重量前的模型。

0

相关内容

文档识别

文档识别主要应用于学习工作等一些关于文档处理的办公领域，可以快速高效利用OCR技术对文案文档、证书、票据、病历、说明书、简历、合同等各类纸质文档进行识别，另外可以通过云端技术将识别后的内容以及图像上传到服务器进行备份储存，并具备方便的检索功能，可以使用户简单方便的找到备份的内容。

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

20克级的水溶性Mn-Cu-In-S磁/光双功能量子点的制备

国家自然科学基金

0+阅读 · 2015年12月31日

Triptolide诱导c-FLIP选择性剪切在调控TRAIL耐药胰腺癌细胞凋亡中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

合金化元素的选择及其对(La, Mg)5Ni19 A5B19型相稳定性和电化学储氢性能的影响

国家自然科学基金

0+阅读 · 2013年12月31日

微纳米结构Ti/Mg双连续相复合材料的可控制备及力学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

页岩气储层超临界二氧化碳压裂页岩体损伤机理

国家自然科学基金

0+阅读 · 2013年12月31日

水射流冲击高温粗糙靶面的换热特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

四种选择性作用于酸敏感离子通道的肽类毒素的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

离子印迹磁性纳米材料选择性净化重金属废水作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

一维ZnO纳米线的高选择性一氧化碳气敏传感器研究

国家自然科学基金

0+阅读 · 2009年12月31日

激光熔池同步送进混合元素粉末的合金化机理

国家自然科学基金

0+阅读 · 2008年12月31日

ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition

Arxiv

0+阅读 · 2022年9月29日

GROOT: Corrective Reward Optimization for Generative Sequential Labeling

Arxiv

0+阅读 · 2022年9月29日

Structured Summarization: Unified Text Segmentation and Segment Labeling as a Generation Task

Arxiv

0+阅读 · 2022年9月28日

Reconstruction-guided attention improves the robustness and shape processing of neural networks

Arxiv

0+阅读 · 2022年9月27日

EEG-based Image Feature Extraction for Visual Classification using Deep Learning

Arxiv

0+阅读 · 2022年9月27日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Image Segmentation Using Deep Learning: A Survey

Image Segmentation Using Deep Learning: A Survey

Arxiv

47+阅读 · 2020年1月15日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Arxiv

16+阅读 · 2017年11月20日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】多目标奖励与偏好优化：理论与算法

《无形的防御者？将定向能武器集成到反无人机框架的机遇与挑战》报告

自主化海军：海上无人系统与未来海战

迈向智能体系统规模化的科学

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition

Arxiv

0+阅读 · 2022年9月29日

GROOT: Corrective Reward Optimization for Generative Sequential Labeling

Arxiv

0+阅读 · 2022年9月29日

Structured Summarization: Unified Text Segmentation and Segment Labeling as a Generation Task

Arxiv

0+阅读 · 2022年9月28日

Reconstruction-guided attention improves the robustness and shape processing of neural networks

Arxiv

0+阅读 · 2022年9月27日

EEG-based Image Feature Extraction for Visual Classification using Deep Learning

Arxiv

0+阅读 · 2022年9月27日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Image Segmentation Using Deep Learning: A Survey

Image Segmentation Using Deep Learning: A Survey

Arxiv

47+阅读 · 2020年1月15日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Arxiv

16+阅读 · 2017年11月20日

相关基金

20克级的水溶性Mn-Cu-In-S磁/光双功能量子点的制备

国家自然科学基金

0+阅读 · 2015年12月31日

Triptolide诱导c-FLIP选择性剪切在调控TRAIL耐药胰腺癌细胞凋亡中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

合金化元素的选择及其对(La, Mg)5Ni19 A5B19型相稳定性和电化学储氢性能的影响

国家自然科学基金

0+阅读 · 2013年12月31日

微纳米结构Ti/Mg双连续相复合材料的可控制备及力学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

页岩气储层超临界二氧化碳压裂页岩体损伤机理

国家自然科学基金

0+阅读 · 2013年12月31日

水射流冲击高温粗糙靶面的换热特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

四种选择性作用于酸敏感离子通道的肽类毒素的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

离子印迹磁性纳米材料选择性净化重金属废水作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

一维ZnO纳米线的高选择性一氧化碳气敏传感器研究

国家自然科学基金

0+阅读 · 2009年12月31日

激光熔池同步送进混合元素粉末的合金化机理

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员