We investigate different systems for extracting mathematical entities from English texts in the mathematical field of category theory as a first step for constructing a mathematical knowledge graph. We consider four different term extractors and compare their results. This small experiment showcases some of the issues with the construction and evaluation of terms extracted from noisy domain text. We also make available two open corpora in research mathematics, in particular in category theory: a small corpus of 755 abstracts from the journal TAC (3188 sentences), and a larger corpus from the nLab community wiki (15,000 sentences).
翻译:我们调查了在分类理论数学领域从英文文本中提取数学实体的不同系统,作为构建数学知识图表的第一步。我们考虑四个不同的术语提取器并比较其结果。这个小实验展示了从吵闹域文本中提取的术语的构建和评估方面的一些问题。我们还提供了两个研究数学公开公司,特别是分类理论:一个小集,755摘要来自《TAC日报》(3188句),另一个大集来自nLab社区维基语(15 000句)。