Synthetic data used for scene text detection and recognition tasks have proven effective. However, there are still two problems: First, the color schemes used for text coloring in the existing methods are relatively fixed color key-value pairs learned from real datasets. The dirty data in real datasets may cause the problem that the colors of text and background are too similar to be distinguished from each other. Second, the generated texts are uniformly limited to the same depth of a picture, while there are special cases in the real world that text may appear across depths. To address these problems, in this paper we design a novel method to generate color schemes, which are consistent with the characteristics of human eyes to observe things. The advantages of our method are as follows: (1) overcomes the color confusion problem between text and background caused by dirty data; (2) the texts generated are allowed to appear in most locations of any image, even across depths; (3) avoids analyzing the depth of background, such that the performance of our method exceeds the state-of-the-art methods; (4) the speed of generating images is fast, nearly one picture generated per three milliseconds. The effectiveness of our method is verified on several public datasets.
翻译:用于现场文本检测和识别任务的合成数据证明是有效的。然而,仍然存在两个问题:第一,现有方法中用于文本颜色颜色的颜色方案是相对固定的颜色关键值对,从真实的数据集中学习。真实数据集中的肮脏数据可能造成一个问题,即文本和背景的颜色过于相似,无法相互区分。第二,生成的文本统一限于图片的同一深度,而真实世界中存在文本可能跨深度出现的特殊案例。为了解决这些问题,我们在本文中设计了一个与人类眼睛特征相一致的颜色方案新颖方法。我们的方法的优点如下:(1)克服了由肮脏数据造成的文本和背景之间的颜色混乱问题;(2)所生成的文本可以出现在任何图像的大多数地点,即使是跨深度;(3)避免分析背景的深度,以便我们方法的性能超过最先进的方法;(4)生成图像的速度是快速的,每三毫秒产生一张近一张照片。我们的方法的有效性在几个公共数据集上得到验证。