The challenges of shape robust text detection lie in two aspects: 1) most existing quadrangular bounding box based detectors are difficult to locate texts with arbitrary shapes, which are hard to be enclosed perfectly in a rectangle; 2) most pixel-wise segmentation-based detectors may not separate the text instances that are very close to each other. To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance. These predictions correspond to different `kernels' produced by shrinking the original text instance into various scales. Consequently, the final detection can be conducted through our progressive scale expansion algorithm which gradually expands the kernels with minimal scales to the text instances with maximal and complete shapes. Due to the fact that there are large geometrical margins among these minimal kernels, our method is effective to distinguish the adjacent text instances and is robust to arbitrary shapes. The state-of-the-art results on ICDAR 2015 and ICDAR 2017 MLT benchmarks further confirm the great effectiveness of PSENet. Notably, PSENet outperforms the previous best record by absolute 6.37\% on the curve text dataset SCUT-CTW1500. Code will be available in https://github.com/whai362/PSENet.
翻译:形状稳健的文本检测挑战有两个方面:(1) 多数现有基于矩形边框的检测器很难找到任意形状的文本,很难将其完全封闭在矩形中;(2) 多数像素-以分解为基础的检测器可能无法将彼此非常接近的文本实例分开。为了解决这些问题,我们提议建立一个创新的渐进规模扩展网络(PSENet),作为以分层为基础的检测器,每个文本实例都可作出多种预测。这些预测与通过将原始文本实例缩小到不同尺度而产生的不同“内核”相对应。因此,最终的检测可以通过我们的渐进式扩展算法进行,该算法将最小尺度的内核逐渐扩大到以最大和完整形状的文本实例。由于这些最小的内核中存在巨大的几何边,我们的方法能够有效地区分相邻的文本实例,并且对任意形状具有很强性。 ICDAR 2015 和 ICDAR 2017 MLT 的基准进一步证实了 PSENet 的伟大效力。GNO, PSENet exminality cremas pregions preformal AS%.