Convolutional Neural Networks (CNN) have shown promising results for the task of Handwritten Text Recognition (HTR) but they still fall behind Recurrent Neural Networks (RNNs)/Transformer based models in terms of performance. In this paper, we propose a CNN based architecture that bridges this gap. Our work, Easter2.0, is composed of multiple layers of 1D Convolution, Batch Normalization, ReLU, Dropout, Dense Residual connection, Squeeze-and-Excitation module and make use of Connectionist Temporal Classification (CTC) loss. In addition to the Easter2.0 architecture, we propose a simple and effective data augmentation technique 'Tiling and Corruption (TACO)' relevant for the task of HTR/OCR. Our work achieves state-of-the-art results on IAM handwriting database when trained using only publicly available training data. In our experiments, we also present the impact of TACO augmentations and Squeeze-and-Excitation (SE) on text recognition accuracy. We further show that Easter2.0 is suitable for few-shot learning tasks and outperforms current best methods including Transformers when trained on limited amount of annotated data. Code and model is available at: https://github.com/kartikgill/Easter2
翻译:移动神经网络(CNN)在手写文本识别(HTR)任务方面已经取得了可喜的成果,但是它们仍然落后于经常性神经网络(RNNS)/ Transformex的绩效模型。在本文中,我们提议建立一个基于CNN的架构,以弥合这一差距。我们的工作,复活节2.0,由1D革命、批次正常化、ReLU、辍学、隐性残余连接、queeze-Expence模块和使用连接时间分类(CTC)损失等多层组成。除了复活节2.0结构外,我们还提出与HTR/OCR任务相关的简单而有效的数据增强技术(TACO)“翻转和腐败(TACO) ” 。我们的工作在仅使用公开可得的培训数据进行培训后,就能在IMA的笔迹数据库上取得最新的最新结果。在实验中,我们还介绍了TACO扩增和递增(Squee-Exvication)对文本识别的影响。我们进一步显示Easter2.0适合在微版/Festarkar2号/Exviewdaldalcom上进行有限的学习任务,包括经培训的当前数据数量。