Arbitrary shape text detection is a challenging task due to its complexity and variety, e.g, various scales, random rotations, and curve shapes. In this paper, we propose an arbitrary shape text detector with a boundary transformer, which can accurately and directly locate text boundaries without any post-processing. Our method mainly consists of a boundary proposal module and an iteratively optimized boundary transformer module. The boundary proposal module consisting of multi-layer dilated convolutions will compute important prior information (including classification map, distance field, and direction field) for generating coarse boundary proposals meanwhile guiding the optimization of boundary transformer. The boundary transformer module adopts an encoder-decoder structure, in which the encoder is constructed by multi-layer transformer blocks with residual connection while the decoder is a simple multi-layer perceptron network (MLP). Under the guidance of prior information, the boundary transformer module will gradually refine the coarse boundary proposals via boundary deformation in an iterative manner. Furthermore, we propose a novel boundary energy loss (BEL) which introduces an energy minimization constraint and an energy monotonically decreasing constraint for every boundary optimization step. Extensive experiments on publicly available and challenging datasets demonstrate the state-of-the-art performance and promising efficiency of our method.
翻译:任意的形状文本检测是一项艰巨的任务, 因为它复杂多样, 包括各种比例尺、 随机旋转和曲线形状。 在本文件中, 我们建议使用边界变压器进行任意的形状文本检测器, 可以准确和直接定位文本边界, 而无需任何后处理。 我们的方法主要包括一个边界建议模块和一个迭代优化的边界变压器模块。 由多层扩大变压器组成的边界建议模块将计算出重要的先前信息( 包括分类图、 距离场和方向场), 用于生成粗化的边界建议, 同时引导边界变压器的优化。 边界变压器模块采用一个编码器- 脱钩器结构, 使编码器由多层变压器块和剩余连接器构建, 而该变压器是一个简单的多层透析器网络( MLP ) 。 在先前信息的指导下, 边界变压器模块将通过迭代方式的边界变压法逐渐完善粗化的边界建议。 此外, 我们提议一个新的边界能源损失( BEL ), 将引入一个能源最小化限制和能量单质递减压的节限制, 展示我们每个边界优化的状态的状态的状态。