The attention mechanism has become the de facto module in scene text recognition (STR) methods, due to its capability of extracting character-level representations. These methods can be summarized into implicit attention based and supervised attention based, depended on how the attention is computed, i.e., implicit attention and supervised attention are learned from sequence-level text annotations and character-level bounding box annotations, respectively. Implicit attention, as it may extract coarse or even incorrect spatial regions as character attention, is prone to suffering from an alignment-drifted issue. Supervised attention can alleviate the above issue, but it is category-specific, which requires extra laborious character-level bounding box annotations and would be memory-intensive when the number of character categories is large. To address the aforementioned issues, we propose a novel attention mechanism for STR, self-supervised implicit glyph attention (SIGA). SIGA delineates the glyph structures of text images by jointly self-supervised text segmentation and implicit attention alignment, which serve as the supervision to improve attention correctness without extra character-level annotations. Experimental results demonstrate that SIGA performs consistently and significantly better than previous attention-based STR methods, in terms of both attention correctness and final recognition performance on publicly available context benchmarks and our contributed contextless benchmarks.
翻译:注意机制已成为现场文本识别方法中事实上的模块,因为其具有提取性格表示的能力,这些方法可以归纳为隐性注意,并有监督的注意,取决于注意的计算方式,即从顺序层次的文字说明和字符层次的捆绑框说明中分别得到隐性注意和监督的注意,隐性注意,因为它可能提取粗糙甚至不正确的空间区域,作为性格注意,容易受到调整性调整问题的影响。受到监督的注意可以缓解上述问题,但属于特定类别,需要非常艰苦的性格约束框说明,在性质类别数量大时,这种方法将具有记忆密集性。为了解决上述问题,我们建议对自上而下的隐性约束性约束性方框说明(SIGA)采取新的注意机制,因为它可能提取粗略的、甚至不正确的空间区域,因为它会通过自上而上而下的文字分解和隐性注意的调,作为在不增加性格说明的情况下提高注意的注意程度的监督。实验结果表明,SIGA在公开确认业绩基准方面一贯地和大大改进了我们现有的最后的注意基准。