阅读和写作:自我浏览文本识别的差别化和生成模型 (Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition) - 专知论文

会员服务 ·

0

Learning · 判别器 · contrastive · MoDELS · Performer ·

2022 年 7 月 1 日

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

翻译：阅读和写作:自我浏览文本识别的差别化和生成模型

Mingkun Yang,Minghui Liao,Pu Lu,Jing Wang,Shenggao Zhu,Hualin Luo,Qi Tian,Xiang Bai

from arxiv, Accepted by ACM MM 2022

Existing text recognition methods usually need large-scale training data. Most of them rely on synthetic training data due to the lack of annotated real images. However, there is a domain gap between the synthetic data and real data, which limits the performance of the text recognition models. Recent self-supervised text recognition methods attempted to utilize unlabeled real images by introducing contrastive learning, which mainly learns the discrimination of the text images. Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method. The contrastive learning branch is adopted to learn the discrimination of text images, which imitates the reading behavior of humans. Meanwhile, masked image modeling is firstly introduced for text recognition to learn the context generation of the text images, which is similar to the writing behavior. The experimental results show that our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets. Moreover, our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size. We also demonstrate that our pre-trained model can be easily applied to other text-related tasks with obvious performance gain.

翻译：现有文本识别方法通常需要大规模的培训数据。多数方法依赖合成培训数据, 原因是缺少注释真实图像。但是, 合成数据和真实数据之间存在领域差距, 限制了文本识别模型的性能。最近自我监督的文本识别方法试图通过引入对比性学习来使用未贴标签的真实图像, 主要是学习文本图像的差别性能。从人们通过阅读和写作学会识别文本的观察中, 我们提议通过将对比性学习和掩码图像模型纳入我们自我监督的方法来学习歧视和生成。对比性学习分支用于学习文本图像的区别性, 这限制了文本识别模型的性能。同时, 掩码图像识别方法试图使用未经自我监督的文本识别方法, 这与写作行为相似。实验结果表明,我们的方法比先前的图像识别方法差强10.2%-20.2%, 在非常规文本识别数据集中,我们提议的文本识别模型超过了先前的状态模型。我们提议的文本识别模型比先前的要简单, 我们的文本识别方法也比先前的要简单, 我们的文本识别方法要比以前明显。

0

相关内容

Learning

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RMND5A基因在遗传性泛发性色素异常症中的致病机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

superstrate结构铜锌硒硫太阳电池制备中的关键科学问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

含油多孔介质中超磁性纳米颗粒的传递机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

马尾松高抗旱家系应答干旱胁迫的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

重金属离子在功能化纤蛇纹石纳米管上的吸附性能及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

界面诱导胶体纳米颗粒自组装

国家自然科学基金

0+阅读 · 2012年12月31日

壳聚糖基吸附材料的设计合成及吸附机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

城市表层土壤重金属污染磁学诊断及其磁性矿物控制机理

国家自然科学基金

0+阅读 · 2009年12月31日

分数阶微分方程边值问题解的定性理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

洋葱状富勒烯表面分子印迹材料制备与性能表征

国家自然科学基金

0+阅读 · 2009年12月31日

GIT: A Generative Image-to-text Transformer for Vision and Language

Arxiv

1+阅读 · 2022年8月22日

KEEP: An Industrial Pre-Training Framework for Online Recommendation via Knowledge Extraction and Plugging

Arxiv

0+阅读 · 2022年8月22日

A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval

Arxiv

0+阅读 · 2022年8月21日

Ered: Enhanced Text Representations with Entities and Descriptions

Ered: Enhanced Text Representations with Entities and Descriptions

Arxiv

0+阅读 · 2022年8月18日

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

Arxiv

0+阅读 · 2022年8月18日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Self-supervised Learning: Generative or Contrastive

Arxiv

19+阅读 · 2020年7月21日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

73+阅读 · 2018年12月22日

VIP会员

文章信息

相关主题

相关VIP内容

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

GIT: A Generative Image-to-text Transformer for Vision and Language

Arxiv

1+阅读 · 2022年8月22日

KEEP: An Industrial Pre-Training Framework for Online Recommendation via Knowledge Extraction and Plugging

Arxiv

0+阅读 · 2022年8月22日

A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval

Arxiv

0+阅读 · 2022年8月21日

Ered: Enhanced Text Representations with Entities and Descriptions

Ered: Enhanced Text Representations with Entities and Descriptions

Arxiv

0+阅读 · 2022年8月18日

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

Arxiv

0+阅读 · 2022年8月18日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Self-supervised Learning: Generative or Contrastive

Arxiv

19+阅读 · 2020年7月21日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

73+阅读 · 2018年12月22日

相关基金

RMND5A基因在遗传性泛发性色素异常症中的致病机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

superstrate结构铜锌硒硫太阳电池制备中的关键科学问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

含油多孔介质中超磁性纳米颗粒的传递机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

马尾松高抗旱家系应答干旱胁迫的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

重金属离子在功能化纤蛇纹石纳米管上的吸附性能及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

界面诱导胶体纳米颗粒自组装

国家自然科学基金

0+阅读 · 2012年12月31日

壳聚糖基吸附材料的设计合成及吸附机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

城市表层土壤重金属污染磁学诊断及其磁性矿物控制机理

国家自然科学基金

0+阅读 · 2009年12月31日

分数阶微分方程边值问题解的定性理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

洋葱状富勒烯表面分子印迹材料制备与性能表征

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员