This paper is devoted to the extraction of entities and semantic relations between them from scientific texts, where we consider scientific terms as entities. In this paper, we present a dataset that includes annotations for two tasks and develop a system called TERMinator for the study of the influence of language models on term recognition and comparison of different approaches for relation extraction. Experiments show that language models pre-trained on the target language are not always show the best performance. Also adding some heuristic approaches may improve the overall quality of the particular task. The developed tool and the annotated corpus are publicly available at https://github.com/iis-research-team/terminator and may be useful for other researchers.
翻译:本文件专门论述从科学文本中摘取各实体和它们之间的语义关系,我们认为科学术语是实体;在本文件中,我们提出了一个数据集,其中包括两项任务的说明,并开发了一个称为 " 术语识别和比较不同关系提取方法的语文模型影响研究 " 的系统。实验表明,在目标语言方面经过预先培训的语言模型并非总是表现最佳。此外,增加一些超自然方法可以提高特定任务的总体质量。开发的工具和附加说明的文集可在https://github.com/iis-research-team/terminator上公开查阅,对其他研究人员可能有用。