Automatic Arabic handwritten recognition is one of the recently studied problems in the field of Machine Learning. Unlike Latin languages, Arabic is a Semitic language that forms a harder challenge, especially with variability of patterns caused by factors such as writer age. Most of the studies focused on adults, with only one recent study on children. Moreover, much of the recent Machine Learning methods focused on using Convolutional Neural Networks, a powerful class of neural networks that can extract complex features from images. In this paper we propose a convolutional neural network (CNN) model that recognizes children handwriting with an accuracy of 91% on the Hijja dataset, a recent dataset built by collecting images of the Arabic characters written by children, and 97% on Arabic Handwritten Character Dataset. The results showed a good improvement over the proposed model from the Hijja dataset authors, yet it reveals a bigger challenge to solve for children Arabic handwritten character recognition. Moreover, we proposed a new approach using multi models instead of single model based on the number of strokes in a character, and merged Hijja with AHCD which reached an averaged prediction accuracy of 96%.
翻译:自动阿拉伯文手写识别是机器学习领域最近研究的问题之一。 与拉丁语言不同, 阿拉伯语是一种犹太语言, 形成更困难的挑战, 特别是作家年龄等因素造成的模式变化。 大多数研究都以成人为重点, 最近只对儿童进行了一项研究。 此外, 最近的机器学习方法大多侧重于使用进化神经网络, 这是一种强大的神经网络, 能从图像中提取复杂特征的神经网络。 在本文中, 我们提议了一个革命性神经网络模型, 在Hijja数据集中, 儿童笔迹的准确度为91%, 这是收集儿童写成的阿拉伯字符图像的最新数据集, 而在阿拉伯手写字符数据集中, 97% 。 研究结果显示比Hijja 数据集作者提议的模型有了良好的改进, 但它揭示了解决儿童阿拉伯手写字符识别的更大挑战。 此外, 我们提议了一种新方法, 使用多种模型, 而不是基于字符中划次数的单一模型, 将Hijja与AHCD合并, 后者的平均预测精确度达到96% 。