The challenging field of scene text detection requires complex data annotation, which is time-consuming and expensive. Techniques, such as weak supervision, can reduce the amount of data needed. In this paper we propose a weak supervision method for scene text detection, which makes use of reinforcement learning (RL). The reward received by the RL agent is estimated by a neural network, instead of being inferred from ground-truth labels. First, we enhance an existing supervised RL approach to text detection with several training optimizations, allowing us to close the performance gap to regression-based algorithms. We then use our proposed system in a weakly- and semi-supervised training on real-world data. Our results show that training in a weakly supervised setting is feasible. However, we find that using our model in a semi-supervised setting , e.g. when combining labeled synthetic data with unannotated real-world data, produces the best results.
翻译:具有挑战性的现场文本探测领域需要复杂的数据说明,这既费时又费钱。技术,例如监管薄弱,可以减少所需数据的数量。在本文中,我们建议对现场文本检测采用一种薄弱的监督方法,利用强化学习(RL)来进行。RL代理商得到的奖励是由神经网络估计的,而不是从地面真相标签中推断。首先,我们通过若干培训优化,加强现有的监管RL文本检测方法,使我们能够缩小基于回归的算法的性能差距。我们随后在现实世界数据的薄弱和半监督培训中使用了我们提议的系统。我们的结果显示,在监管薄弱的环境中进行培训是可行的。然而,我们发现在半监督的环境中使用我们的模型,例如,在将标签的合成数据与无注释的真实世界数据相结合时,可以产生最佳的结果。