手写词辨识自培训,促进合成人到实际适应 (Self-Training of Handwritten Word Recognition for Synthetic-to-Real Adaptation)

Performances of Handwritten Text Recognition (HTR) models are largely determined by the availability of labeled and representative training samples. However, in many application scenarios labeled samples are scarce or costly to obtain. In this work, we propose a self-training approach to train a HTR model solely on synthetic samples and unlabeled data. The proposed training scheme uses an initial model trained on synthetic data to make predictions for the unlabeled target dataset. Starting from this initial model with rather poor performance, we show that a considerable adaptation is possible by training against the predicted pseudo-labels. Moreover, the investigated self-training strategy does not require any manually annotated training samples. We evaluate the proposed method on four widely used benchmark datasets and show its effectiveness on closing the gap to a model trained in a fully-supervised manner.

翻译：手写文本识别模型(HTR)的性能主要取决于是否有贴标签和具有代表性的培训样本,然而,在许多应用中,贴标签的样本很少,或获取成本很高;在这项工作中,我们提议采用自我培训办法,只对合成样品和无标签数据进行HTR模型培训;拟议培训计划使用经过合成数据培训的初始模型,对未贴标签的目标数据集作出预测;从这一初始模型开始,我们表现较差,我们表明通过针对预测的假标签进行培训,可以进行大量调整;此外,调查的自我培训战略不需要人工附加说明的培训样本;我们评估了四个广泛使用的基准数据集的拟议方法,并展示了该方法在缩小差距方面的效力,以完全监督的方式培训了模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

NeurIPS2021 | Cycle Self-Training：领域自适应的循环自训练方法与理论

专知会员服务

20+阅读 · 2021年11月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日