In this paper, we present a system for information extraction from scientific texts in the Russian language. The system performs several tasks in an end-to-end manner: term recognition, extraction of relations between terms, and term linking with entities from the knowledge base. These tasks are extremely important for information retrieval, recommendation systems, and classification. The advantage of the implemented methods is that the system does not require a large amount of labeled data, which saves time and effort for data labeling and therefore can be applied in low- and mid-resource settings. The source code is publicly available and can be used for different research purposes.
翻译:在本文中,我们提出了一个从俄语科学文本中提取信息的系统,该系统以端至端的方式执行若干任务:术语识别、术语关系和与知识库中实体连接的术语。这些任务对于信息检索、建议系统和分类极为重要。实施的方法的优点是,该系统不需要大量贴有标签的数据,这节省了数据标签的时间和精力,因此可以应用于中低资源环境。源代码是公开的,可用于不同的研究目的。