Scene-text spotting is a task that predicts a text area on natural scene images and recognizes its text characters simultaneously. It has attracted much attention in recent years due to its wide applications. Existing research has mainly focused on improving text region detection, not text recognition. Thus, while detection accuracy is improved, the end-to-end accuracy is insufficient. Texts in natural scene images tend to not be a random string of characters but a meaningful string of characters, a word. Therefore, we propose adversarial learning of semantic representations for scene text spotting (A3S) to improve end-to-end accuracy, including text recognition. A3S simultaneously predicts semantic features in the detected text area instead of only performing text recognition based on existing visual features. Experimental results on publicly available datasets show that the proposed method achieves better accuracy than other methods.
翻译:显示文字显示是一项任务,它预测自然场景图像的文字区域,同时识别其文字字符。近年来,由于应用面很广,它引起了许多注意。现有的研究主要侧重于改进文本区域探测,而不是文本识别。因此,虽然检测的准确性有所提高,但端到端的准确性不够。自然场景图像中的文字往往不是字符的随机字符串,而是有意义的字符串,一个词。因此,我们提议对屏幕文字识别(A3S)的语义表示进行对抗性学习,以提高终端到终端的准确性,包括文本识别。A3S同时预测所探测到的文本区域的语义特征,而不是仅仅根据现有视觉特征进行文本识别。公开提供的数据集的实验结果显示,拟议方法比其他方法更准确。