Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications. Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture. However, these methods adhere to the paradigm that the prediction is made "from one character to another", which inevitably yields prediction errors due to the complicated structures of mathematical expressions or crabbed handwritings. In this paper, we propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network. Specifically, we present a set of grammar rules for converting the LaTeX markup sequence of each expression into a parsing tree; then, we model the markup sequence prediction as a tree traverse process with a deep neural network. In this way, the proposed method can effectively describe the syntax context of expressions, alleviating the structure prediction errors of HMER. Experiments on three benchmark datasets demonstrate that our method achieves better recognition performance than prior arts. To further validate the effectiveness of our method, we create a large-scale dataset consisting of 100k handwritten mathematical expression images acquired from ten thousand writers. The source code, new dataset, and pre-trained models of this work will be publicly available.
翻译:手写数学表达式识别 (HMER) 是一项具有许多潜在应用的具有挑战性的任务。 HMER 的最近方法已经以编码器解码结构实现了杰出的性能。 然而,这些方法遵循了一种范式,即预测是“从一个字符到另一个字符”的,这必然会产生预测错误,因为数学表达式或螃蟹笔迹的结构复杂。在本文中,我们为 HMER 提出了一个简单而有效的方法,这是第一个将语法信息纳入编码解码网络的方法。具体地说,我们提出了一套将每个表达式的 LaTeX 标记序列转换成剖析树的语法规则;然后,我们用深层神经网络将标记序列预测建成树轨过程。 如此, 拟议的方法可以有效地描述表达式的语法背景, 减轻 HMER 的结构预测错误。 对三个基准数据集的实验表明, 我们的方法比先前的艺术得到更好的识别性。 为了进一步验证我们的方法的有效性, 我们创建了一套由100个手写式数学表达式的新的数据集, 将会对1,000个作家进行公开分析。