Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Therefore, methods based on spatial transformers are extensively studied. However, chromatic difficulties in complex scenes have not been paid much attention on. In this work, we introduce a new learnable geometric-unrelated module, the Structure-Preserving Inner Offset Network (SPIN), which allows the color manipulation of source data within the network. This differentiable module can be inserted before any recognition architecture to ease the downstream tasks, giving neural networks the ability to actively transform input intensity rather than the existing spatial rectification. It can also serve as a complementary module to known spatial transformations and work in both independent and collaborative ways with them. Extensive experiments show that the use of SPIN results in a significant improvement on multiple text recognition benchmarks compared to the state-of-the-arts.
翻译:任意的文本外观在现场文本识别任务中构成了巨大的挑战。 现有工作主要处理形状扭曲问题,包括观点扭曲、线曲曲或其他风格变异。 因此,对基于空间变压器的方法进行了广泛研究。 但是,对复杂场面的染色体困难没有给予多少重视。 在这项工作中,我们引入了一个新的可学习的与几何无关的模块,即结构保护内部偏移网络(SPIN),它允许网络内部源数据进行色控。在任何识别结构之前可以插入这一不同的模块,以缓解下游任务,使神经网络有能力积极转换输入强度,而不是现有的空间整化。它也可以作为已知的空间变换和独立合作方式工作的一个补充模块。 广泛的实验表明,使用SPIN的结果是大大改进了多个文本识别基准,与艺术现状相比。