Recent advances in handwritten text recognition enabled to recognize whole documents in an end-to-end way: the Document Attention Network (DAN) recognizes the characters one after the other through an attention-based prediction process until reaching the end of the document. However, this autoregressive process leads to inference that cannot benefit from any parallelization optimization. In this paper, we propose Faster DAN, a two-step strategy to speed up the recognition process at prediction time: the model predicts the first character of each text line in the document, and then completes all the text lines in parallel through multi-target queries and a specific document positional encoding scheme. Faster DAN reaches competitive results compared to standard DAN, while being at least 4 times faster on whole single-page and double-page images of the RIMES 2009, READ 2016 and MAURDOR datasets. Source code and trained model weights are available at https://github.com/FactoDeepLearning/FasterDAN.
翻译:文件注意网络(DAN)通过基于关注的预测过程,逐个识别各个字符,直到文件结束。然而,这一自动递减过程导致无法从任何平行优化中受益的推论。在本文中,我们建议加速 DAN,这是在预测时加快识别过程的两步战略:模型预测文件中每条文字行的第一个字符,然后通过多目标查询和具体的文档位置编码计划,同时完成所有文本行。加速丹达到与标准丹相比的竞争结果,而2009年RIMES、2016年READ和MAURDOR的整个单页和双页图像至少要快4倍。源代码和经过培训的模型重量见https://github.com/FactoDepLearn/FasterDAN。