The scholarly publication space is growing steadily not just in numbers but also in complexity due to collaboration between individuals from within and across fields of research. This paper presents a hierarchical classification system that automatically categorizes a scholarly publication using its abstract into a three-tier hierarchical label set of fields (discipline-field-subfield). This system enables a holistic view about the interdependence of research activities in the mentioned hierarchical tiers in terms of knowledge production through articles and impact through citations. The classification system (44 disciplines - 738 fields - 1,501 subfields) utilizes and is able to cope with 160 million abstract snippets in Microsoft Academic Graph (Version 2018-05-17) using batch training in a modularized and distributed fashion to address and assess interdisciplinarity and inter-field classifications. In addition, we have explored multi-class classifications in both the single-label and multi-label settings. In total, we have conducted 3,140 experiments, in all models (Convolutional Neural Networks, Recurrent Neural Networks, Transformers), the classification accuracy is > 90% in 77.84% and 78.83% of the single-label and multi-label classifications, respectively. We examine the advantages of our classification by its ability to better align research texts and output with disciplines, to adequately classify them in an automated way, as well as to capture the degree of interdisciplinarity in a publication which enables downstream analytics such as field interdisciplinarity. This system (a set of pretrained models) can serve as a backbone to an interactive system of indexing scientific publications.
翻译:学术出版空间不仅在数量上稳步增长,而且由于研究领域内部和跨领域个人之间的协作而日益复杂。本文件展示了一个等级分类系统,将学术出版物用抽象内容自动分类成三层等级标签组合(纪律-实地-子领域)。这个系统使人们得以从整体上看待上述等级层次的研究活动在通过文章和影响创造知识方面的相互依存性。分类系统(44个学科-738字段-1 501个字段)利用并能够应付微软学术图(2018-05-17版本)的1.6亿个抽象片段。它采用模块化和分布式的形式,将学术出版物自动分类,处理和评估不同性和不同领域之间的分类。此外,我们在单级标签和多标签环境中都探讨了多等级研究活动的相互依存性。我们在所有模型(革命神经网络、复合神经网络、变异体)中共进行了3 140个实验,分类准确性指数大于77.84%和78.83%。 我们通过将单一标签和多标签分类体系的实地分类化培训,从而将这种分类和多等级的系统作为一种更精确的分类,可以用来将它们作为一种更精确的分类,作为一种更精确的版本的分类。我们的出版物,通过一种更精确的分类的方式,将它们作为一种更精确的实地的分类,作为一种更精确的分类,作为一种程度的、更精确的检索的分类,用来用来用来进行。我们的一种分类。