Text in natural images is of arbitrary orientations, requiring detection in terms of oriented bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text presence detection, which is a classification problem disregarding text orientation; 2) oriented bounding box regression, which concerns about text orientation. Previous methods rely on shared features for both tasks, resulting in degraded performance due to the incompatibility of the two tasks. To address this issue, we propose to perform classification and regression on features of different characteristics, extracted by two network branches of different designs. Concretely, the regression branch extracts rotation-sensitive features by actively rotating the convolutional filters, while the classification branch extracts rotation-invariant features by pooling the rotation-sensitive features. The proposed method named Rotation-sensitive Regression Detector (RRD) achieves state-of-the-art performance on three oriented scene text benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17 and COCO-Text. Furthermore, RRD achieves a significant improvement on a ship collection dataset, demonstrating its generality on oriented object detection.
翻译:自然图像中的文字是任意取向,需要从方向约束框中探测到。通常,多方向文本检测器通常需要两项关键任务:(1) 文本存在检测,这是一个分类问题,无视文本方向;(2) 方向约束框回归,关注文本方向;(2) 以往的方法依赖两种任务的共同特点,由于两个任务不相容而导致性能下降。为解决这一问题,我们提议对不同设计的两个网络分支所提取的不同特征进行分类和回归。具体地说,回归分支提取的旋转敏感特征是积极旋转卷轴过滤器,而分类分支则通过将旋转敏感特征集中起来,提取旋转变量。拟议的方法名为旋转敏感回归探测器(RRRD)在三个面向现场文本基准数据集上达到最新性能,包括ICDAR 2015、MSRA-TD500、RCTW-17和COCO-Text。此外,RRD在船舶收集数据集上取得了显著改进,展示了定向物体探测的一般性。