通过基于双向双向相互学习的 " 集中关注 " 的双向双向相互学习,确认手写数学表达式 (Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning)

Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images. Currently, attention-based encoder-decoder models are widely used in this task. They typically generate target sequences in a left-to-right (L2R) manner, leaving the right-to-left (R2L) contexts unexploited. In this paper, we propose an Attention aggregation based Bi-directional Mutual learning Network (ABM) which consists of one shared encoder and two parallel inverse decoders (L2R and R2L). The two decoders are enhanced via mutual distillation, which involves one-to-one knowledge transfer at each training step, making full use of the complementary information from two inverse directions. Moreover, in order to deal with mathematical symbols in diverse scales, an Attention Aggregation Module (AAM) is proposed to effectively integrate multi-scale coverage attentions. Notably, in the inference phase, given that the model already learns knowledge from two inverse directions, we only use the L2R branch for inference, keeping the original parameter size and inference speed. Extensive experiments demonstrate that our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling, substantially outperforming the state-of-the-art methods. The source code is available in https://github.com/XH-B/ABM.

翻译：手写数学表达式识别旨在从给定图像中自动生成 LaTeX 序列。目前, 此任务中广泛使用基于关注的编码解码器- 解码器模型。它们通常以左对右( L2R) 的方式生成目标序列, 使得右对左( R2L) 环境没有被开发。本文中, 我们建议建立一个基于关注汇总的双向相互学习网络( ABM), 由一个共享的编码器和两个平行的反向解码器( L2R 和 R2L ) 组成。两个解码器通过相互蒸馏而得到加强, 包括每个培训步骤的一对一知识转移, 并充分利用两个反方向的补充信息。此外, 为了在不同尺度中处理数学符号, 我们建议了一个基于注意的轨迹库, 双向双向双向相互学习网络( AB) 。鉴于模型已经从两个反向学习了知识, 我们只使用 L2RB 分支来进行推导, 将原始参数大小维持在每步中进行一对一对一对一知识的转换,, CD- CD- MIA AS- 85 在2014 的 CRBA 中, 的精确度实验中, 。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日