The long-standing challenges for offline handwritten Chinese character recognition (HCCR) are twofold: Chinese characters can be very diverse and complicated while similarly looking, and cursive handwriting (due to increased writing speed and infrequent pen lifting) makes strokes and even characters connected together in a flowing manner. In this paper, we propose the template and instance loss functions for the relevant machine learning tasks in offline handwritten Chinese character recognition. First, the character template is designed to deal with the intrinsic similarities among Chinese characters. Second, the instance loss can reduce category variance according to classification difficulty, giving a large penalty to the outlier instance of handwritten Chinese character. Trained with the new loss functions using our deep network architecture HCCR14Layer model consisting of simple layers, our extensive experiments show that it yields state-of-the-art performance and beyond for offline HCCR.
翻译:中国脱线手写字字符识别(HCCR)的长期挑战是双重的:中国字符可以非常多样和复杂,而相似的外观和文笔(由于写作速度加快和不经常的笔举)可以使笔记和甚至字符以流动的方式连接在一起。在本文中,我们提议了脱线手写字中文字符识别中相关机器学习任务的模板和实例损失功能。首先,字符模板的设计是为了处理中国字符之间的内在相似性。第二,实例损失可以根据分类困难减少类别差异,对手写中文字符的外形给予重罚。我们利用由简单层组成的深网络结构 HCCR14Layer 模型对新的损失功能进行了培训,我们广泛的实验显示,它产生最先进的性能,在离线 HCCRR 之外产生最先进的性能。