以多式半半支持学习促进文本识别 (Multimodal Semi-Supervised Learning for Text Recognition) - 专知论文

会员服务 ·

0

多峰值 · 学成 · 未标记 · state-of-the-art · Extensibility ·

2022 年 5 月 8 日

Multimodal Semi-Supervised Learning for Text Recognition

翻译：以多式半半支持学习促进文本识别

Aviad Aberdam,Roy Ganz,Shai Mazor,Ron Litman

from arxiv, Code will be published upon publication

Until recently, the number of public real-world text images was insufficient for training scene text recognizers. Therefore, most modern training methods rely on synthetic data and operate in a fully supervised manner. Nevertheless, the amount of public real-world text images has increased significantly lately, including a great deal of unlabeled data. Leveraging these resources requires semi-supervised approaches; however, the few existing methods do not account for vision-language multimodality structure and therefore suboptimal for state-of-the-art multimodal architectures. To bridge this gap, we present semi-supervised learning for multimodal text recognizers (SemiMTR) that leverages unlabeled data at each modality training phase. Notably, our method refrains from extra training stages and maintains the current three-stage multimodal training procedure. Our algorithm starts by pretraining the vision model through a single-stage training that unifies self-supervised learning with supervised training. More specifically, we extend an existing visual representation learning algorithm and propose the first contrastive-based method for scene text recognition. After pretraining the language model on a text corpus, we fine-tune the entire network via a sequential, character-level, consistency regularization between weakly and strongly augmented views of text images. In a novel setup, consistency is enforced on each modality separately. Extensive experiments validate that our method outperforms the current training schemes and achieves state-of-the-art results on multiple scene text recognition benchmarks.

翻译：直到最近,公共真实世界文本图像的数量还不足以用于培训现场文本识别者。因此,大多数现代培训方法都依赖合成数据,并以充分监督的方式运作。然而,公共真实世界文本图像的数量最近大幅增加,包括大量未贴标签的数据。利用这些资源需要半监督的方法;然而,少数现有方法没有考虑到愿景语言多式联运结构,因此对最新现代多式联运结构而言并不最理想。为了缩小这一差距,我们为在每种模式培训阶段利用无标签数据的多文本识别器(SemimMTR)提供半监督学习。值得注意的是,我们的方法避免了额外的培训阶段,并维持了目前三个阶段的多式联运培训程序。我们的算法首先通过单阶段培训对愿景模型进行初步培训,使自我监督学习与监督培训相结合。更具体地说,我们推广了现有的视觉代表学习算法,并提出了首个基于对比的文本识别方法。在对语言模型进行预先培训后,我们通过连续性特征级别,对整个网络的薄弱版本进行了严格调整,通过强化了当前版本的文本的常规化,使整个版本的文本更加一致。

0

相关内容

多峰值

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

自适应移动Kriging插值响应面可靠性分析方法及其应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

miR-146a靶向IRAK1与TRAF6调控非小细胞肺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

地下水中痕量卤素形态分析研究

国家自然科学基金

0+阅读 · 2012年12月31日

糖尿病大鼠肠道菌群演替特征及与宿主相互作用在疾病发生发展中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属-氧化铝或氧化钛纳米有序阵列复合结构光子晶体的设计、制备及其光子传输特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

糖尿病血管钙化的新机制：高糖诱导内皮细胞－成骨细胞转分化的研究

国家自然科学基金

0+阅读 · 2012年12月31日

交通视觉中鲁棒目标检测方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

恶性肿瘤细胞凋亡新型小分子PET显像剂的研制

国家自然科学基金

0+阅读 · 2009年12月31日

Continual Learning with Transformers for Image Classification

Continual Learning with Transformers for Image Classification

Arxiv

0+阅读 · 2022年6月28日

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Arxiv

0+阅读 · 2022年6月27日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

73+阅读 · 2018年12月22日

Deep Learning on Graphs: A Survey

Arxiv

53+阅读 · 2018年12月11日

Graph Convolutional Networks for Text Classification

Arxiv

31+阅读 · 2018年11月13日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Continual Learning with Transformers for Image Classification

Continual Learning with Transformers for Image Classification

Arxiv

0+阅读 · 2022年6月28日

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Arxiv

0+阅读 · 2022年6月27日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

73+阅读 · 2018年12月22日

Deep Learning on Graphs: A Survey

Arxiv

53+阅读 · 2018年12月11日

Graph Convolutional Networks for Text Classification

Arxiv

31+阅读 · 2018年11月13日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

自适应移动Kriging插值响应面可靠性分析方法及其应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

miR-146a靶向IRAK1与TRAF6调控非小细胞肺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

地下水中痕量卤素形态分析研究

国家自然科学基金

0+阅读 · 2012年12月31日

糖尿病大鼠肠道菌群演替特征及与宿主相互作用在疾病发生发展中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属-氧化铝或氧化钛纳米有序阵列复合结构光子晶体的设计、制备及其光子传输特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

糖尿病血管钙化的新机制：高糖诱导内皮细胞－成骨细胞转分化的研究

国家自然科学基金

0+阅读 · 2012年12月31日

交通视觉中鲁棒目标检测方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

恶性肿瘤细胞凋亡新型小分子PET显像剂的研制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员