This study presents an ensemble approach that addresses the challenges of identification and analysis of research articles in rapidly evolving fields, using the field of Artificial Intelligence (AI) as a case study. Our approach included using decision tree, sciBERT and regular expression matching on different fields of the articles, and a SVM to merge the results from different models. We evaluated the effectiveness of our method on a manually labeled dataset, finding that our combined approach captured around 97% of AI-related articles in the web of science (WoS) corpus with a precision of 0.92. This presents a 0.15 increase in F1 score compared with existing search term based approach. Following this, we analyzed the publication volume trends and common research themes.We found that compared with existing methods, our ensemble approach revealed an increased degree of interdisciplinarity, and was able to identify more articles in certain subfields like feature extraction and optimization. This study demonstrates the potential of our approach as a tool for the accurate identification of scholarly articles, which is also capable of providing insights into the volume and content of a research area.
翻译:暂无翻译