This study introduces an approach to estimate the uncertainty in bibliometric indicator values that is caused by data errors. This approach utilizes Bayesian regression models, estimated from empirical data samples, which are used to predict error-free data. Through direct Monte Carlo simulation -- drawing predicted data from the estimated regression models a large number of times for the same input data -- probability distributions for indicator values can be obtained, which provide the information on their uncertainty due to data errors. It is demonstrated how uncertainty in base quantities, such as the number of publications of a unit of certain document types and the number of citations of a publication, can be propagated along a measurement model into final indicator values. This method can be used to estimate the uncertainty of indicator values due to sources of errors with known error distributions. The approach is demonstrated with simple synthetic examples for instructive purposes and real bibliometric research evaluation data to show its possible application in practice.
翻译:本研究提出了一种方法,用于估计数据错误引起的科学评价指标值的不确定性。该方法利用贝叶斯回归模型,估计基于样本数据的无误数据。通过直接蒙特卡洛模拟——从估计的回归模型中大量抽取预测数据,多次生成相同的输入数据——可以获取指标值的概率分布,提供由于数据错误引起的不确定性信息。演示了如何将基础数据量(例如某些文献类型的出版物数量和出版物引用次数)的不确定性,传播到测量模型中生成最终的指标值。该方法可用于估计由于已知的错误分布源而引起的指标值不确定性。通过简单的合成数据示例以及真实的科学评价数据演示了该方法的实际应用。