Although a polygon is a more accurate representation than an upright bounding box for text detection, the annotations of polygons are extremely expensive and challenging. Unlike existing works that employ fully-supervised training with polygon annotations, this study proposes an unconstrained text detection system termed Polygon-free (PF), in which most existing polygon-based text detectors (e.g., PSENet [33],DB [16]) are trained with only upright bounding box annotations. Our core idea is to transfer knowledge from synthetic data to real data to enhance the supervision information of upright bounding boxes. This is made possible with a simple segmentation network, namely Skeleton Attention Segmentation Network (SASN), that includes three vital components (i.e., channel attention, spatial attention and skeleton attention map) and one soft cross-entropy loss. Experiments demonstrate that the proposed Polygonfree system can combine general detectors (e.g., EAST, PSENet, DB) to yield surprisingly high-quality pixel-level results with only upright bounding box annotations on a variety of datasets (e.g., ICDAR2019-Art, TotalText, ICDAR2015). For example, without using polygon annotations, PSENet achieves an 80.5% F-score on TotalText [3] (vs. 80.9% of fully supervised counterpart), 31.1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs. We hope that PF can provide a new perspective for text detection to reduce the labeling costs. The code can be found at https://github.com/weijiawu/Unconstrained-Text-Detection-with-Box-Supervisionand-Dynamic-Self-Training.
翻译:虽然多边形比一个用于文本检测的直线捆绑框更精确,但多边形的注释却非常昂贵且具有挑战性。与使用全监督的多边形说明培训的现有工作不同,本研究建议建立一个不受到限制的文本检测系统,称为无多边形(PF),在该系统中,大多数现有的多边形文本探测器(例如PSENet[33]、DB[16])只受过直线捆绑框说明的培训。我们的核心思想是将知识从合成数据转移到真实数据,以加强右对齐绑框的监督信息。这可以通过一个简单的分割网络,即Skeleton-全受监督的注意偏移网络(SASN),其中包括三个关键组成部分(即频道关注、空间关注和骨骼关注地图)和一个软跨式交叉滴补损。实验表明,拟议的多边形无边网系统可以将普通探测器(例如,FTE,PSNet, DESNet,D,DB) 生成出令人惊讶的高质量的像素20级自定义,只有更直直径的框框框框框框框框框, 才能在一系列的数据集上(eal-DROLOLOID.