We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article's contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%.
翻译:我们描述了从计算语言学(CL)学术文章标题中自动获取突出科学实体的有章可循的方法,有两点意见促成了这一方法:(一) 注意到某条款在标题中贡献的突出方面;和(二) 反映一套规则中可以表述的突出术语的规律性模式,只有选择容易识别的、经常发生的法-综合学模式,并表明一种科学实体类型,为收集50 237项CL标题以涵盖ACL Anthlogy的所有条款制定了规则,总共以75%的平均精确度提取了19 799项研究问题、18 111项解决办法、20 033项资源、1 059种语言、6 878种工具和21 687种方法。