Scene text recognition in low-resource Indian languages is challenging because of complexities like multiple scripts, fonts, text size, and orientations. In this work, we investigate the power of transfer learning for all the layers of deep scene text recognition networks from English to two common Indian languages. We perform experiments on the conventional CRNN model and STAR-Net to ensure generalisability. To study the effect of change in different scripts, we initially run our experiments on synthetic word images rendered using Unicode fonts. We show that the transfer of English models to simple synthetic datasets of Indian languages is not practical. Instead, we propose to apply transfer learning techniques among Indian languages due to similarity in their n-gram distributions and visual features like the vowels and conjunct characters. We then study the transfer learning among six Indian languages with varying complexities in fonts and word length statistics. We also demonstrate that the learned features of the models transferred from other Indian languages are visually closer (and sometimes even better) to the individual model features than those transferred from English. We finally set new benchmarks for scene-text recognition on Hindi, Telugu, and Malayalam datasets from IIIT-ILST and Bangla dataset from MLT-17 by achieving 6%, 5%, 2%, and 23% gains in Word Recognition Rates (WRRs) compared to previous works. We further improve the MLT-17 Bangla results by plugging in a novel correction BiLSTM into our model. We additionally release a dataset of around 440 scene images containing 500 Gujarati and 2535 Tamil words. WRRs improve over the baselines by 8%, 4%, 5%, and 3% on the MLT-19 Hindi and Bangla datasets and the Gujarati and Tamil datasets.
翻译:低资源印度语言的Screen 文本识别之所以具有挑战性, 是因为复杂得多, 比如多脚本、 字体、 文本大小和方向等。 在这项工作中, 我们研究所有层次的深处文本识别网络从英语向两种普通印度语言的传输能力。 我们在常规 CRNN 模型和STAR- Net 上进行实验, 以确保通用性。 为了研究不同脚本变化的影响, 我们开始对使用 Unicode 字体提供的合成文字图像进行实验。 我们显示, 将英语模型转换为简单的印度语言合成数据集是不切实际的。 相反, 我们提议在印度语、 Telugu 和 Malayalam 图像分布相似性, 包括正字和正字字符。 然后我们研究在六种印度语中学习的转移, 字体和文字长度不同。 我们还表明, 从其他印度语中传输的模型的学习特征比从英文转换为更近(有时更好)。 我们最后为印度语、 Telugu 和 Malaylam 的文字识别新基准, 在正文中, 3- 和马利亚拉姆 将数据升级数据转换为 4- RDLTLT 上的数据升级 3 和升级 将数据更新为BI- Rev 3 3 和升级数据升级数据 3 3 和升级为BILBILT 3 3 和升级为BTLTLT 。