MSdocTr-Lite: 一种适用于多脚本全页手写识别的轻量级Transformer模型 (MSdocTr-Lite: A Lite Transformer for Full Page Multi-script Handwriting Recognition) - 专知论文

会员服务 ·

0

手写识别 · Transformer模型 · 识别 · Transformer · 数据集 ·

2023 年 3 月 24 日

MSdocTr-Lite: A Lite Transformer for Full Page Multi-script Handwriting Recognition

翻译：MSdocTr-Lite: 一种适用于多脚本全页手写识别的轻量级Transformer模型

Marwa Dhiaf,Ahmed Cheikh Rouhou,Yousri Kessentini,Sinda Ben Salem

The Transformer has quickly become the dominant architecture for various pattern recognition tasks due to its capacity for long-range representation. However, transformers are data-hungry models and need large datasets for training. In Handwritten Text Recognition (HTR), collecting a massive amount of labeled data is a complicated and expensive task. In this paper, we propose a lite transformer architecture for full-page multi-script handwriting recognition. The proposed model comes with three advantages: First, to solve the common problem of data scarcity, we propose a lite transformer model that can be trained on a reasonable amount of data, which is the case of most HTR public datasets, without the need for external data. Second, it can learn the reading order at page-level thanks to a curriculum learning strategy, allowing it to avoid line segmentation errors, exploit a larger context and reduce the need for costly segmentation annotations. Third, it can be easily adapted to other scripts by applying a simple transfer-learning process using only page-level labeled images. Extensive experiments on different datasets with different scripts (French, English, Spanish, and Arabic) show the effectiveness of the proposed model.

翻译：Transformer已经成为各种模式识别任务中占主导地位的架构之一，因为它具有长距离表示的能力。但是，Transformer是数据密集的模型，并且需要大量的数据进行训练。在手写文本识别（HTR）中，收集大量标注数据是一个复杂而昂贵的任务。在本文中，我们提出了一种适用于多脚本全页手写识别的轻量级Transformer架构。所提出的模型具有三个优点：首先，为了解决数据稀缺的通常问题，我们提出了一种轻量级Transformer模型，可以在合理的数据集上进行训练，即大多数HTR公共数据集的情况下，无需外部数据。其次，它可以通过一种课程学习策略学习页面级别的阅读顺序，从而避免线条切割错误，利用更大的上下文并减少昂贵的分割注释。第三，它可以通过使用仅带页面标注图像的简单转移学习过程轻松适应其他字母表。在不同字母表（法语，英语，西班牙语和阿拉伯语）的不同数据集上进行的大量实验表明了所提出模型的有效性。

1

相关内容

手写识别

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【2022新书】Transformer自然语言处理，Natural Language Processing with Transformers: Building Language Applications with Hugging Face

【2022新书】Transformer自然语言处理，Natural Language Processing with Transformers: Building Language Applications with Hugging Face

专知会员服务

522+阅读 · 2022年1月31日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CVPR2021】基于端到端预训练的视觉-语言表征学习

【CVPR2021】基于端到端预训练的视觉-语言表征学习

专知会员服务

38+阅读 · 2021年4月9日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

GitHub超9千星：一个API调用27个NLP预训练模型

GitHub超9千星：一个API调用27个NLP预训练模型

新智元

17+阅读 · 2019年7月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

基于深度信念网络的高光谱遥感影像变化检测方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于可扩展多个脑功能与结构模板的脑功能定位方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境下可扩展的细粒度加密关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多语用户模型的个性化跨语言信息检索研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于类别多点时空转换概率和决策融合的变化检测

国家自然科学基金

0+阅读 · 2012年12月31日

基于目标模型的横切关注点识别及语义连接点定义方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于面向对象遥感和本体的多尺度地理元胞自动机研究

国家自然科学基金

0+阅读 · 2011年12月31日

中文医学文本中关联信息提取方法研究

国家自然科学基金

2+阅读 · 2009年12月31日

戊型肝炎病毒ORF2蛋白与宿主细胞蛋白的相互作用及其生物学意义

国家自然科学基金

0+阅读 · 2008年12月31日

tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation

Arxiv

0+阅读 · 2023年5月16日

Pre-Training to Learn in Context

Arxiv

0+阅读 · 2023年5月16日

Improved baselines for vision-language pre-training

Arxiv

0+阅读 · 2023年5月15日

Analyzing Deep Learning Representations of Point Clouds for Real-Time In-Vehicle LiDAR Perception

Arxiv

0+阅读 · 2023年5月15日

StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse Representations and Content Enhancing

Arxiv

0+阅读 · 2023年5月15日

PLIP: Language-Image Pre-training for Person Representation Learning

Arxiv

0+阅读 · 2023年5月15日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

VIP会员

文章信息

相关主题

Transformer模型

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【2022新书】Transformer自然语言处理，Natural Language Processing with Transformers: Building Language Applications with Hugging Face

【2022新书】Transformer自然语言处理，Natural Language Processing with Transformers: Building Language Applications with Hugging Face

专知会员服务

522+阅读 · 2022年1月31日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CVPR2021】基于端到端预训练的视觉-语言表征学习

【CVPR2021】基于端到端预训练的视觉-语言表征学习

专知会员服务

38+阅读 · 2021年4月9日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

乌克兰太空研究（2022-2024年） | 176页

【CMU博士论文】大型语言模型的隐性特性

国防领域人工智能走向何方？

相关资讯

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

GitHub超9千星：一个API调用27个NLP预训练模型

GitHub超9千星：一个API调用27个NLP预训练模型

新智元

17+阅读 · 2019年7月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation

Arxiv

0+阅读 · 2023年5月16日

Pre-Training to Learn in Context

Arxiv

0+阅读 · 2023年5月16日

Improved baselines for vision-language pre-training

Arxiv

0+阅读 · 2023年5月15日

Analyzing Deep Learning Representations of Point Clouds for Real-Time In-Vehicle LiDAR Perception

Arxiv

0+阅读 · 2023年5月15日

StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse Representations and Content Enhancing

Arxiv

0+阅读 · 2023年5月15日

PLIP: Language-Image Pre-training for Person Representation Learning

Arxiv

0+阅读 · 2023年5月15日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

相关基金

基于深度信念网络的高光谱遥感影像变化检测方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于可扩展多个脑功能与结构模板的脑功能定位方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境下可扩展的细粒度加密关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多语用户模型的个性化跨语言信息检索研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于类别多点时空转换概率和决策融合的变化检测

国家自然科学基金

0+阅读 · 2012年12月31日

基于目标模型的横切关注点识别及语义连接点定义方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于面向对象遥感和本体的多尺度地理元胞自动机研究

国家自然科学基金

0+阅读 · 2011年12月31日

中文医学文本中关联信息提取方法研究

国家自然科学基金

2+阅读 · 2009年12月31日

戊型肝炎病毒ORF2蛋白与宿主细胞蛋白的相互作用及其生物学意义

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员