This study introduces an approach to estimate the uncertainty in bibliometric indicator values that is caused by data errors. This approach utilizes Bayesian regression models, estimated from empirical data samples, which are used to predict error-free data. Through direct Monte Carlo simulation - drawing many replicates of predicted data from the estimated regression models for the same input data - probability distributions for indicator values can be obtained, which provide the information on their uncertainty due to data errors. It is demonstrated how uncertainty in base quantities, such as the number of publications of a unit of certain document types and the number of citations of a publication, can be propagated along a measurement model into final indicator values. Synthetic examples are used to illustrate the method and real bibliometric research evaluation data is used to show its application in practice. Though in this contribution we just use two out of a larger number of known bibliometric error categories and therefore can account for only some part of the total uncertainty due to inaccuracies, the latter example reveals that average values of citation impact scores of publications of research groups need to be used very cautiously as they often have large margins of error resulting from data inaccuracies.
翻译:暂无翻译