预测学者h指数的作者和出版物特定特征的贡献研究 (Investigating the contribution of author- and publication-specific features to scholars' h-index prediction)

Evaluation of researchers' output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers' scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers seeking solutions to improve it. This study investigates the effect of author, paper and venue-specific features on the future h-index. For this purpose, we used machine learning methods to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it 'prior impact-based features' and includes the number of publications, received citations, and h-index. The second group is 'non-impact-based features' and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting h-index for researchers in three different career phases. Also, we examine the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author's characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. We found that non-impact-based features are more robust predictors for younger scholars than seniors in the short term. Also, prior impact-based features lose their power to predict more than other features in the long-term.

翻译：评估研究人员的学术产出对聘请委员会和资金机构至关重要，通常通过其科学生产力、引用文献或基于H指数的综合指标来衡量。评估年轻的研究人员更加关键，因为需要一段时间才能获得引用和h指数的增长。因此，预测h指数有助于发现研究人员的科学影响力。此外，确定预测科学影响力的影响因素有助于寻找解决方案来改善研究人员的绩效。本研究调查了作者、论文和会议特定功能对未来h指数的影响。为此，我们使用机器学习方法来预测h指数和特征分析技术来增进特征影响的理解。利用Scopus中的比文献数据，我们定义和提取了两组主要功能。第一个与先前的科研影响相关，我们称之为“先前影响性功能”，包括发表文章、接受引用和h指数的数量。第二组是“非影响基础功能”，包括与作者、共同作者、文献和论坛特征相关的功能。我们探讨了它们在预测三个不同职业阶段的研究人员的h指数时的重要性。此外，我们检查了不同功能类别预测绩效的时间维度，以找出哪些特征对长期和短期预测更可靠。我们提到作者的性别，以检查这种作者特性在预测任务中的作用。我们的研究发现性别对预测h指数的影响非常微弱。我们发现，与其他特征相比，非影响基础特征对年轻学者的短期预测更具有鲁棒性。此外，相对于其他特征，先前的影响特征在长期内失去了预测绩效。