Scene text detection remains a grand challenge due to the variation in text curvatures, orientations, and aspect ratios. One of the hardest problems in this task is how to represent text instances of arbitrary shapes. Although many methods have been proposed to model irregular texts in a flexible manner, most of them lose simplicity and robustness. Their complicated post-processings and the regression under Dirac delta distribution undermine the detection performance and the generalization ability. In this paper, we propose an efficient text instance representation named CentripetalText (CT), which decomposes text instances into the combination of text kernels and centripetal shifts. Specifically, we utilize the centripetal shifts to implement pixel aggregation, guiding the external text pixels to the internal text kernels. The relaxation operation is integrated into the dense regression for centripetal shifts, allowing the correct prediction in a range instead of a specific value. The convenient reconstruction of text contours and the tolerance of prediction errors in our method guarantee the high detection accuracy and the fast inference speed, respectively. Besides, we shrink our text detector into a proposal generation module, namely CentripetalText Proposal Network, replacing Segmentation Proposal Network in Mask TextSpotter v3 and producing more accurate proposals. To validate the effectiveness of our method, we conduct experiments on several commonly used scene text benchmarks, including both curved and multi-oriented text datasets. For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods, e.g., F-measure of 86.3% at 40.0 FPS on Total-Text, F-measure of 86.1% at 34.8 FPS on MSRA-TD500, etc. For the task of end-to-end scene text recognition, our method outperforms Mask TextSpotter v3 by 1.1% on Total-Text.
翻译:由于文本弯曲、方向和节节率的差异,Screen 文本检测仍是一项巨大的挑战。任务中最棘手的问题之一是如何代表任意形状的文本实例。虽然许多方法都提议以灵活的方式模拟非常规文本,但大多数都失去了简洁和稳健性。它们复杂的后处理和Dirac 三角形分布下的回归会破坏检测性能和总体化能力。在本文件中,我们提议了一个名为 CentripetalText (CT) 的高效文本实例代表,它将文字实例分解成文本内核内核和中子值转换的组合。具体地说,我们利用百分数组合式文本转换来实施像素组合,引导外部文本像素形成内部文本内核内核内核内核。放松操作被整合为偏密后回归,在范围中进行正确预测,而不是在特定值内进行。我们的方法中,对文本矩形法的精确度和预测误差保证了高探测性能和快速速度。此外,我们将我们的文本高级检测和精度变精度转换工具在服务器上进行了比较,在演示文阶中,在生成系统内,在生成内,在生成系统内,在生成系统内,包括正向导的正向性判变变变法中,在生成的文本的文本的模工具中,在制中,在制中,在制中,在制中,在制成正正正正态中,在制式变法。