Generally pre-training and long-time training computation are necessary for obtaining a good-performance text detector based on deep networks. In this paper, we present a new scene text detection network (called FANet) with a Fast convergence speed and Accurate text localization. The proposed FANet is an end-to-end text detector based on transformer feature learning and normalized Fourier descriptor modeling, where the Fourier Descriptor Proposal Network and Iterative Text Decoding Network are designed to efficiently and accurately identify text proposals. Additionally, a Dense Matching Strategy and a well-designed loss function are also proposed for optimizing the network performance. Extensive experiments are carried out to demonstrate that the proposed FANet can achieve the SOTA performance with fewer training epochs and no pre-training. When we introduce additional data for pre-training, the proposed FANet can achieve SOTA performance on MSRATD500, CTW1500 and TotalText. The ablation experiments also verify the effectiveness of our contributions.
翻译:一般而言,为了在深层网络的基础上获得良好的性能文本检测器,必须进行预培训和长期培训计算。本文还介绍了一个新的现场文本检测网络(称为FANet),其速度快速趋同和精确的文本本地化。拟议的FANet是一种端到端的文本检测器,其基础是变压器特征学习和正规化的Fourier描述模型,其中设计了Fourier描述器建议网络和迭代文本描述网络,以便高效率和准确地确定文本建议。此外,还提议了一种密集匹配战略和精心设计的丢失功能,以优化网络的性能。还进行了广泛的实验,以证明拟议的FANet能够以较少的培训程度和没有培训前培训达到SOTA的性能。当我们为培训前引入额外数据时,拟议的FANet可以在MSRATD500、CTW1500和TalText上实现SOTA的性能。此外,模拟实验还验证了我们的贡献的有效性。