To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask based text representation strategies, which leads to a high dependency of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They aggravate the decline of shrink-masks recognition. To avoid the above problems, we propose a Zoom Text Detector (ZTD) inspired by the zoom process of the camera. Specifically, Zoom Out Module (ZOM) is introduced to provide coarse-grained optimization objectives for coarse layers to avoid feature defocusing. Meanwhile, Zoom In Module (ZIM) is presented to enhance the margins recognition to prevent detail loss. Furthermore, Sequential-Visual Discriminator (SVD) is designed to suppress false-positive samples by sequential and visual features. Experiments verify the superior comprehensive performance of ZTD.
翻译:为了追求全面的性能,最近的文本探测器可以提高探测速度,以牺牲准确性为代价。它们采用基于缩略片的文本代表战略,从而导致在缩略片上高度依赖检测准确性。不幸的是,三个缺点导致不可靠的缩略片边缘。具体地说,这些方法试图通过语义信息,从背景中强化对缩略片的区别。然而,由于粗糙层被细微磨损的目标优化的特征去重点化现象限制了对语义特征的提取。同时,由于缩微片和边距都属于文本,因此,对边距忽视的详细损失现象阻碍了对缩略微质片与边际的区分,从而导致微缩片边缘边缘的偏差。此外,假阳性样本具有与缩微片相相似的视觉特征。它们加剧了对缩微量值认识的下降。为避免上述问题,我们提议由相机缩微缩图解过程所启发的缩放文本检测器(ZTD)。具体地说,缩略图模块(ZOM)被引入了详细的缩缩缩缩缩缩缩缩缩缩图,目的是将Simalimalalimal laimal Blaimal vial Dial lagistrisal view view disal dal disal disgradududududuction。