Text recognition is a popular research subject with many associated challenges. Despite the considerable progress made in recent years, the text recognition task itself is still constrained to solve the problem of reading cropped line text images and serves as a subtask of optical character recognition (OCR) systems. As a result, the final text recognition result is limited by the performance of the text detector. In this paper, we propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA), which can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference. This enables an ordinary text recognizer to process multi-line text such that text detection can be completely freed. Specifically, we integrate IFA into the two most prevailing text recognition streams (attention-based and CTC-based) and propose attention-guided dense prediction (ADP) and Extended CTC (ExCTC). Furthermore, the Wasserstein-based Hollow Aggregation Cross-Entropy (WH-ACE) is proposed to suppress negative predictions to assist in training ADP and ExCTC. We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks while maintaining the fastest speed, and ADP and ExCTC complement each other on the perspective of different application scenarios. Code will be available at https://github.com/WangTianwei/Implicit-feature-alignment.
翻译:尽管近年来取得了相当大的进展,但文本识别任务本身仍被局限在解决读取成形线文本图像的问题上,并成为光学字符识别系统的子任务。结果,最后文本识别结果受到文本检测器的性能的限制。在本文件中,我们提出了一个简单、优雅和有效的范例,称为隐性地貌协调(IFA),可以很容易地融入目前的文本识别器,从而形成一种被称为IFAinference的新颖的推论机制。这使得普通文本识别器能够处理多线文本,从而完全解脱文本检测。具体地说,我们将IFA纳入两种最流行的文本识别流(基于保护的和基于CTC的),并提出了关注引导密集的预测(ADP)和扩展的CTC(ExCTC)。此外,我们提议以Wasserbarstein-Holgrlow Agregnation Cros-Epropy (W-ACE)为基础,以禁止负面预测来协助培训ADP和ExCTC。我们实验性地展示IFA-FA-FA-fliental Ex-flial Ex-deal ex-formaxal acal ex-forstal laviewmental laviewal ex-formaxal ex-formacial ex-formaxal diviews