Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images. Currently, attention-based encoder-decoder models are widely used in this task. They typically generate target sequences in a left-to-right (L2R) manner, leaving the right-to-left (R2L) contexts unexploited. In this paper, we propose an Attention aggregation based Bi-directional Mutual learning Network (ABM) which consists of one shared encoder and two parallel inverse decoders (L2R and R2L). The two decoders are enhanced via mutual distillation, which involves one-to-one knowledge transfer at each training step, making full use of the complementary information from two inverse directions. Moreover, in order to deal with mathematical symbols in diverse scales, an Attention Aggregation Module (AAM) is proposed to effectively integrate multi-scale coverage attentions. Notably, in the inference phase, given that the model already learns knowledge from two inverse directions, we only use the L2R branch for inference, keeping the original parameter size and inference speed. Extensive experiments demonstrate that our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling, substantially outperforming the state-of-the-art methods. The source code is available in https://github.com/XH-B/ABM.
翻译:手写数学表达式识别旨在从给定图像中自动生成 LaTeX 序列 。 目前, 此任务中广泛使用基于关注的编码解码器- 解码器模型。 它们通常以左对右( L2R) 的方式生成目标序列, 使得右对左( R2L) 环境没有被开发。 本文中, 我们建议建立一个基于关注汇总的双向相互学习网络( ABM), 由一个共享的编码器和两个平行的反向解码器( L2R 和 R2L ) 组成。 两个解码器通过相互蒸馏而得到加强, 包括每个培训步骤的一对一知识转移, 并充分利用两个反方向的补充信息。 此外, 为了在不同尺度中处理数学符号, 我们建议了一个基于注意的轨迹库, 双向双向双向相互学习网络( AB) 。 鉴于模型已经从两个反向学习了知识, 我们只使用 L2RB 分支来进行推导, 将原始参数大小维持在每步中进行一对一对一对一知识的转换,, CD- CD- MIA AS- 85 在2014 的 CRBA 中, 的精确度实验中, 。