A text on an image often stores important information and directly carries high level semantics, makes it as important source of information and become a very active research topic. Many studies have shown that the use of CNN-based neural networks is quite effective and accurate for image classification which is the basis of text recognition. It can also be more enhanced by using transfer learning from pre-trained model trained on ImageNet dataset as an initial weight. In this research, the recognition is trained by using Chars74K dataset and the best model results then tested on some samples of IIIT-5K-Dataset. The research results showed that the best accuracy is the model that trained using VGG-16 architecture applied with image transformation of rotation 15{\deg}, image scale of 0.9, and the application of gaussian blur effect. The research model has an accuracy of 97.94% for validation data, 98.16% for test data, and 95.62% for the test data from IIIT-5K-Dataset. Based on these results, it can be concluded that pre-trained CNN can produce good accuracy for text recognition, and the model architecture that used in this study can be used as reference material in the development of text detection systems in the future
翻译:图像上的文本通常含有重要信息,直接含有高层次的语义学,并直接含有高层次的语义学,使其成为重要的信息来源,成为非常活跃的研究主题。许多研究表明,使用CNN的神经网络对于作为文本识别基础的图像分类非常有效和准确,也可以通过使用在图像网络数据集方面受过培训的经过培训的模型的转移学习作为初始重量来提高。在这项研究中,通过使用Chars74K数据集和随后在IIIT-5K-Dataset的一些样本中测试的最佳模型结果来培训承认。研究结果表明,最佳准确性是使用VGG-16结构进行VGG-16结构培训的模型,该模型应用的是15=deg}、0.9的图像比例以及Gausian模糊效应的应用。研究模型的准确性为:验证数据的97.94%,测试数据的98.16%,以及IIIT-5K-Dataset的测试数据的95.62%。根据这些结果,可以得出结论,经过事先培训的CNNCNC能够产生良好的文本识别准确性,而本研究中使用的模型结构可以用作未来文本开发系统中的参考系统。