Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv (2.5B mathematical objects) and the mathematical reviewing service for pure and applied mathematics zbMATH (61M mathematical objects). Our study lays a foundation for future research projects on mathematical information retrieval for large scientific corpora. Further, we demonstrate the relevance of our results to a variety of use-cases. For example, to assist semantic extraction systems, to improve scientific search engines, and to facilitate specialized math recommendation systems. The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e.g., linking $P_{n}^{(\alpha, \beta)}\!\left(x\right)$ with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems. To expedite future research projects, we have made available our source code and data.
翻译:数学符号,即用于交流数学概念的书写系统,为各种信息搜索和检索系统编码了宝贵的信息。然而,数学符号仍然大多被今天的系统所没有使用。在本文中,我们提出了关于数学符号在两个大型科学公司中分布情况的第一次深入研究:开放存取 arXiv (2.5B数学对象) 和纯数学和应用数学 zbMATH(61M数学对象)的数学审评服务。我们的研究为今后关于为大型科学公司检索数学信息研究项目奠定了基础。此外,我们展示了我们的成果与各种使用案例的相关性。例如,协助语义提取系统,改进科学搜索引擎,便利专门的数学推荐系统。我们介绍的研究贡献如下:(1) 我们介绍对纯读和应用数学 zbMATHT(61M数学对象) 和数学数学公式的第一次分配分析;(2) 我们检索相关的数学符号搜索源(例如,将 $P ⁇ _Q_Q\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\