Evaluation of researchers' output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers' scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers seeking solutions to improve it. This study investigates the effect of author, paper and venue-specific features on the future h-index. For this purpose, we used machine learning methods to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it 'prior impact-based features' and includes the number of publications, received citations, and h-index. The second group is 'non-impact-based features' and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting h-index for researchers in three different career phases. Also, we examine the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author's characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. We found that non-impact-based features are more robust predictors for younger scholars than seniors in the short term. Also, prior impact-based features lose their power to predict more than other features in the long-term.
翻译:对研究人员产出的评价对于雇用委员会和供资机构至关重要,通常通过科学生产率、引文或h-index等综合指标来测量。评估年轻研究人员的特性更为关键,因为需要一段时间才能获得 h-index的引文和增量。因此,预测 h-index可有助于发现研究人员的科学影响。此外,确定预测科学影响的有影响因素有助于研究人员寻求改进科学影响的办法。本研究调查作者、纸张和地点对未来h-index的影响。为此,我们使用了机器学习方法预测h-index和特征分析技术,以增进对地貌影响的了解。利用Scopus的双光度数据,我们定义和提取了两种主要特征组。首先与先前的科学影响有关,我们称之为“主要影响特征”并包括出版物的数量、收到的引文和h-index。第二组基于“非影响特性”并包含与作者、共同作者、纸张和地点有关的特性,以增进对地貌影响的了解。我们在Scopopus的高级预测阶段中,我们定义了它们对于长期预测作用的重要性。我们在Speutimal-al-al ex ex revial rial rial ex ex reviews revidustrate revidustration revidustration reviews redustration redududududustrations redustrations redustrations redustrations redustrations redustrations redudududustrations laute views) 中,我们发现了三个性能中,我们发现的性别-ex vidudududududududududustr vidustr views vidustr ex viclex vicless views views vidududududududududududustrs ex vicless 中,我们在判断中,我们发现了三个的性别指标中,我们在预测中发现了性研究者在预测中发现了三个研究者在判断中发现了三个性研究者在判断中发现了更长期中发现的性别-s vical rial rial vical vidududududududududustral vical vical vical