This paper considers the two-dataset problem, where data are collected from two potentially different populations sharing common aspects. This problem arises when data are collected by two different types of researchers or from two different sources. We may reach invalid conclusions without using knowledge about the data collection process. To address this problem, this paper develops statistical regression models focusing on the difference in measurement and proposes two prediction errors that help to evaluate the underlying data collection process. As a consequence, it is possible to discuss the heterogeneity/similarity of the set of predictors in terms of prediction. Two real datasets are selected to illustrate our method.
翻译:本文探讨了两个数据集问题,即从两个可能不同的人口中收集数据,具有共同的方面;当数据由两种不同类型的研究人员或两个不同的来源收集时,就会产生这个问题;我们可能得出无效的结论,而不使用对数据收集过程的了解;为解决这一问题,本文件开发了侧重于计量差异的统计回归模型,并提出了两个预测错误,以帮助评价基本数据收集过程;因此,有可能讨论预测器在预测方面的异质性/相似性。选择了两个真实数据集来说明我们的方法。