Text detection and recognition are essential components of a modern OCR system. Most OCR approaches attempt to obtain accurate bounding boxes of text at the detection stage, which is used as the input of the text recognition stage. We observe that when using tight text bounding boxes as input, a text recognizer frequently fails to achieve optimal performance due to the inconsistency between bounding boxes and deep representations of text recognition. In this paper, we propose Box Adjuster, a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models. Additionally, when dealing with cross-domain problems such as synthetic-to-real, the proposed method significantly reduces mismatches in domain distribution between the source and target domains. Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training. Specifically, on several benchmark datasets for scene text understanding, the proposed method outperforms state-of-the-art text spotters by an average of 2.0% F-Score on end-to-end text recognition tasks and 4.6% F-Score on domain adaptation tasks.
翻译:文本检测和识别是现代 OCR 系统的基本组成部分。 多数 OCR 方法试图在检测阶段获得精确的文本捆绑框,用作文本识别阶段的输入。 我们观察到,当使用紧的文本捆绑框作为输入时,一个文本识别器往往无法取得最佳性能,因为捆绑框与对文本识别的深度表达不一致。 在本文中,我们提议了一个基于强化学习的调整每个文本捆绑盒形状的方法,以使之与文本识别模型更加兼容。 此外,在处理合成到现实等跨领域问题时,拟议方法极大地减少了源与目标领域之间域分布不匹配的情况。 实验表明,当使用调整后的绑框作为地面真实性培训时,端到端文本识别系统的性能是可以改进的。 具体地说,关于现场文本理解的若干基准数据集,拟议方法以终端到端文本识别任务的2.0% F-Sricol 和域适应任务的4.6% F-Sric 的F-Sric 显示器,以平均2.0% F-Sric 的F-S-Sric recol 任务为模式, 。