In arbitrary shape text detection, locating accurate text boundaries is challenging and non-trivial. Existing methods often suffer from indirect text boundary modeling or complex post-processing. In this paper, we systematically present a unified coarse-to-fine framework via boundary learning for arbitrary shape text detection, which can accurately and efficiently locate text boundaries without post-processing.In our method, we explicitly model the text boundary via an innovative iterative boundary transformer in a coarse-to-fine manner. In this way, our method can directly gain accurate text boundaries and abandon complex post-processing to improve efficiency. Specifically, our method mainly consists of a feature extraction backbone, a boundary proposal module, and an iteratively optimized boundary transformer module. The boundary proposal module consisting of multi-layer dilated convolutions will compute important prior information (including classification map, distance field, and direction field) for generating coarse boundary proposals while guiding the boundary transformer's optimization. The boundary transformer module adopts an encoder-decoder structure, in which the encoder is constructed by multi-layer transformer blocks with residual connection while the decoder is a simple multi-layer perceptron network (MLP). Under the guidance of prior information, the boundary transformer module will gradually refine the coarse boundary proposals via iterative boundary deformation. Furthermore, we propose a novel boundary energy loss (BEL) which introduces an energy minimization constraint and an energy monotonically decreasing constraint to further optimize and stabilize the learning of boundary refinement. Extensive experiments on publicly available and challenging datasets demonstrate the state-of-the-art performance and promising efficiency of our method.
翻译:在任意形状的文本检测中,定位准确的文本边界是具有挑战性和非三角性的。 现有的方法往往受到间接文本边界模型或复杂后处理的困扰。 在本文中, 我们系统地通过边界学习提供一个统一的粗到软框架, 以任意形状的文本检测, 可以准确和高效地定位文本边界, 而无需后处理。 在我们的方法中, 我们明确地通过一个创新的迭代边界变压器来模拟文本边界, 以粗到软的方式引导边界变压器的优化。 以这种方式, 我们的方法可以直接获取准确的文本边界界限, 并放弃复杂的后处理来提高效率。 具体地说, 我们的方法主要包括一个特征提取主干线、 边界提议模块和一个迭代最优化的边界变换模块。 由多层变换主、 变硬化的边界变压系统, 一个简单的多层变压模型, 一个简单的多层变压的能源变压系统, 一个简单的多层边界变压系统, 一个简单的多层变压式的变压系统, 一个简单的变压式的变压式的网络, 。