Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers often employ mathematical concepts through formula notation to refer to prior knowledge. Our long-term goal is to generalize citation-based IR methods and apply this generalized method to both classical references and mathematical concepts. In this paper, we suggest how mathematical formulas could be cited and define a Formula Concept Retrieval task with two subtasks: Formula Concept Discovery (FCD) and Formula Concept Recognition (FCR). While FCD aims at the definition and exploration of a 'Formula Concept' that names bundled equivalent representations of a formula, FCR is designed to match a given formula to a prior assigned unique mathematical concept identifier. We present machine learning-based approaches to address the FCD and FCR tasks. We then evaluate these approaches on a standardized test collection (NTCIR arXiv dataset). Our FCD approach yields a precision of 68% for retrieving equivalent representations of frequent formulas and a recall of 72% for extracting the formula name from the surrounding text. FCD and FCR enable the citation of formulas within mathematical documents and facilitate semantic search and question answering as well as document similarity assessments for plagiarism detection or recommender systems.
翻译:在科学、技术、工程和数学领域,研究人员往往通过公式符号使用数学概念来参考先前的知识。我们的长期目标是推广基于引用的信息检索方法,并将这种通用方法应用于古典参考和数学概念。在本文件中,我们建议如何引用数学公式,并用两个子任务来界定公式概念检索任务:概念发现和概念识别。在科学、技术、工程和数学领域,研究人员往往通过公式符号来使用数学概念概念来定义和探索“公式概念概念”概念,以提及先前的知识。我们的长期目标是将基于引用的信息检索方法推广到一般参考和数学概念概念概念。我们在此文件中,我们建议如何引用数学公式公式,并用两个子任务来定义公式概念检索任务:概念发现(FCD)和概念识别概念识别系统。虽然FCD旨在定义和探索“格式概念概念概念概念概念概念”的定义和探索“公式”的概念概念概念概念概念概念,以定义和探索“公式”的定义和“公式”中,68%的精确度,用于对常规公式和公式的文本进行检索。</s>