Most existing scene text detectors require large-scale training data which cannot scale well due to two major factors: 1) scene text images often have domain-specific distributions; 2) collecting large-scale annotated scene text images is laborious. We study domain adaptive scene text detection, a largely neglected yet very meaningful task that aims for optimal transfer of labelled scene text images while handling unlabelled images in various new domains. Specifically, we design SCAST, a subcategory-aware self-training technique that mitigates the network overfitting and noisy pseudo labels in domain adaptive scene text detection effectively. SCAST consists of two novel designs. For labelled source data, it introduces pseudo subcategories for both foreground texts and background stuff which helps train more generalizable source models with multi-class detection objectives. For unlabelled target data, it mitigates the network overfitting by co-regularizing the binary and subcategory classifiers trained in the source domain. Extensive experiments show that SCAST achieves superior detection performance consistently across multiple public benchmarks, and it also generalizes well to other domain adaptive detection tasks such as vehicle detection.
翻译:由于两个主要因素,大多数现有的现场文本探测器需要大规模培训数据,但由于以下两个主要因素,这些数据无法大范围推广:1)现场文本图像往往有特定域的分布;2)收集大规模附加说明的现场文本图像是艰苦的。我们研究领域适应性现场文本探测,这是一项基本上被忽视但非常有意义的任务,目的是在处理各种新领域的未贴标签图像的同时,最佳地传输贴有标签的现场文本图像。具体地说,我们设计了SCAST,这是一个亚类自培训技术,它能够有效减轻网络在现场适应性文本探测中过度装配和杂音假标签的难度。SCAST由两个新的设计组成。对于标签的源数据,它为地面文本和背景材料引入了假子类分类,帮助培训具有多级检测目标的更通用源模型。对于未贴标签的目标数据,它减轻了网络的网络,使在源域内培训的二进制和亚类分类者相互适应。广泛的实验表明,SCAST在多个公共基准中实现了高级的探测性,它还将其他领域的适应性探测任务概括到车辆探测任务。