This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of residual convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network was trained on a large collection of computer-generated text-line images for seven common Khmer fonts. The proposed model's performance outperformed the state-of-art Tesseract OCR engine for Khmer language on the 3000-images test set by achieving a character error rate (CER) of 1% vs 3%.
翻译:本文为高棉光学字符识别( OCR) 任务提供了一个尾端到端的深层循环神经网络解决方案。 提议的解决方案使用关注机制的序列到序列( Seq2Seqeq) 结构。 编码器通过残余革命区块层和一组封闭的经常性单元( GRU) 从输入的文本线图像中提取视觉特征。 特征在单一背景矢量中编码, 以及一组隐藏状态, 向解码器输入一个字符以解码, 直至达到特殊句末符号。 注意机制允许解码器网络在预测目标字符的同时, 适应性地选择输入图像的一部分 。 Seq2Seq高棉OCR 网络在为七种普通高棉字体大量收集计算机生成的文本线图像方面接受了培训 。 拟议的模型性能超过了3000 image 测试设定的高棉文的高级语言的状态的读取器引擎, 达到1% vs 3% 的字符错误率( CER) 。