Signage is everywhere and a robot should be able to take advantage of signs to help it localize (including Visual Place Recognition (VPR)) and map. Robust text detection & recognition in the wild is challenging due to such factors as pose, irregular text, illumination, and occlusion. We propose an end-to-end scene text spotting model that simultaneously outputs the text string and bounding boxes. This model is more suitable for VPR. Our central contribution is introducing utilizing an end-to-end scene text spotting framework to adequately capture the irregular and occluded text regions in different challenging places. To evaluate our proposed architecture's performance for VPR, we conducted several experiments on the challenging Self-Collected Text Place (SCTP) benchmark dataset. The initial experimental results show that the proposed method outperforms the SOTA methods in terms of precision and recall when tested on this benchmark.
翻译:信号无处不在, 机器人应该能够利用信号帮助其本地化( 包括视觉位置识别( VPR) 和地图 ) 。 野生强健的文本检测和识别具有挑战性, 原因有如布局、 不正常的文本、 照明和隐蔽等。 我们提出了一个端到端的文本识别模型, 同时输出文本字符串和捆绑框。 这个模型更适合 VPR 。 我们的核心贡献是使用端到端的现场文本识别框架, 以充分捕捉不同挑战性地点的不正常和隐蔽文本区域。 为了评估我们提议的 VPR 结构的性能, 我们在具有挑战性的自译自审文本站基准数据集上进行了几项实验。 初步实验结果表明, 拟议的方法在精确度上超过了 SOTA 方法, 并在测试该基准时提醒 。