Handling complicated text images (e.g., irregular structures, low resolution, heavy occlusion, and even illumination), existing supervised text recognition methods are data-hungry. Although these methods employ large-scale synthetic text images to reduce the dependence on annotated real images, the domain gap limits the recognition performance. Therefore, exploring the robust text feature representation on unlabeled real images by self-supervised learning is a good solution. However, existing self-supervised text recognition methods only execute sequence-to-sequence representation learning by roughly splitting the visual features along the horizontal axis, which will damage the character structures. Besides, these sequential-level self-learning methods limit the availability of geometric-based data augmentation, as large-scale geometry augmentation leads to sequence-to-sequence inconsistency. To address the above-mentioned issues, we proposed a novel self-supervised character-to-character distillation method, CCD. Specifically, we delineate the character structures of unlabeled real images by designing a self-supervised character segmentation module, and further apply the segmentation results to build character-level representation learning. CCD differs from prior works in that we propose a character-level pretext task to learn more fine-grained feature representations. Besides, compared with the inflexible augmentations of sequence-to-sequence models, our work satisfies character-to-character representation consistency, across various transformations (e.g., geometry and colour), to generate robust text features in the representative space. Experiments demonstrate that CCD achieves state-of-the-art performance on publicly available text recognition benchmarks.
翻译:处理复杂的文本图像(如非常规结构、低分辨率、高超封闭度、甚至光化),现有的监管文本识别方法是数据饥饿。虽然这些方法使用大规模合成文本图像以减少对注释真实图像的依赖性,但域差限制了识别性。因此,探索通过自我监督学习在未贴标签的真实图像上显示强性文字特征是一个很好的解决方案。然而,现有的自监管文本识别方法仅通过沿横向轴将视觉特征大致分割开来进行序列到序列的学习,这将损害字符结构。此外,这些顺序级的稳健自我学习方法限制了基于地理测量的数据增强的可用性,因为大规模地理测量增强导致顺序到序列的不一致性能。为了解决上述问题,我们建议了一种自监管字符到字符的蒸馏方法,CCD。具体地,我们通过设计一个自我监督的字符分类模块,并进一步应用分解结果来构建基于地理特征的数据增强性增强性数据增强性增强性能,在纸质结构中,在SCCD中,我们从先前的文本上学习了比性层次结构,在Sdealalal-dealdealal-ex redustrisal redustrual redustration ex redudustration resdudustrismal lavidustration lavidududududustral lex ex ex