Existing methods for arbitrary-shaped text detection in natural scenes face two critical issues, i.e., 1) fracture detections at the gaps in a text instance; and 2) inaccurate detections of arbitrary-shaped text instances with diverse background context. To address these issues, we propose a novel method named Intra- and Inter-Instance Collaborative Learning (I3CL). Specifically, to address the first issue, we design an effective convolutional module with multiple receptive fields, which is able to collaboratively learn better character and gap feature representations at local and long ranges inside a text instance. To address the second issue, we devise an instance-based transformer module to exploit the dependencies between different text instances and a global context module to exploit the semantic context from the shared background, which are able to collaboratively learn more discriminative text feature representation. In this way, I3CL can effectively exploit the intra- and inter-instance dependencies together in a unified end-to-end trainable framework. Besides, to make full use of the unlabeled data, we design an effective semi-supervised learning method to leverage the pseudo labels via an ensemble strategy. Without bells and whistles, experimental results show that the proposed I3CL sets new state-of-the-art results on three challenging public benchmarks, i.e., an F-measure of 77.5% on ICDAR2019-ArT, 86.9% on Total-Text, and 86.4% on CTW-1500. Notably, our I3CL with the ResNeSt-101 backbone ranked 1st place on the ICDAR2019-ArT leaderboard. The source code will be available at https://github.com/ViTAE-Transformer/ViTAE-Transformer-Scene-Text-Detection.
翻译:为解决这些问题,我们提议了一个名为Intra和Instance合作学习(I3CL)的新颖方法。具体地说,为了解决第一个问题,我们设计了一个有效的革命模块,拥有多个可接收字段,能够合作学习当地和长距离文本实例中的更好性格和差距特征。为了解决第二个问题,我们设计了一个基于实例的变异器模块,以利用不同文本实例和全球背景之间的依赖性。为了解决这些问题,我们提议了一个名为Intra和Instent-Inter-Inter-Inspeople Convention Leading(I3CLL)的新方法。我们设计了一个具有多个可接收字段的有效变异模块,该模块能够在当地和长距离中共同学习更好的性特征和差异特征。此外,为了充分利用未加标签的数据,我们设计了一个基于实例的变异性变异变异变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变式模型模块,在ICLS3TRDRICLIDRDRIDFIDFIDMT上将显示新的IDIDIGT结果。