Arbitrary-shaped text detection is an important and challenging task in computer vision. Most existing methods require heavy data labeling efforts to produce polygon-level text region labels for supervised training. In order to reduce the cost in data labeling, we study weakly-supervised arbitrary-shaped text detection for combining various weak supervision forms (e.g., image-level tags, coarse, loose and tight bounding boxes), which are far easier for annotation. We propose an Expectation-Maximization (EM) based weakly-supervised learning framework to train an accurate arbitrary-shaped text detector using only a small amount of polygon-level annotated data combined with a large amount of weakly annotated data. Meanwhile, we propose a contour-based arbitrary-shaped text detector, which is suitable for incorporating weakly-supervised learning. Extensive experiments on three arbitrary-shaped text benchmarks (CTW1500, Total-Text and ICDAR-ArT) show that (1) using only 10% strongly annotated data and 90% weakly annotated data, our method yields comparable performance to state-of-the-art methods, (2) with 100% strongly annotated data, our method outperforms existing methods on all three benchmarks. We will make the weakly annotated datasets publicly available in the future.
翻译:任意形状的文本检测是计算机愿景中一项重要而艰巨的任务。 多数现有方法都需要大量数据标签, 以制作用于监管培训的多边文字区域标签。 为了降低数据标签的成本, 我们研究以各种薄弱监督形式( 图像级标签、 粗糙、 松散和紧凑的捆绑盒等)结合的、 容易批注的、 广度的三种任意形状的文本基准( CTW1500、 Total-Text 和 ICDAR-ArT) 的实验表明:(1) 仅使用10% 强烈的附加说明数据, 90 % 微度的附加说明数据, 加上大量微量的附加说明数据。 同时, 我们建议使用一种基于等低监督的任意形状的文本检测器, 用于整合薄弱监视性学习。