We provide a comprehensive theory of conducting in-sample statistical inference about receiver operating characteristic (ROC) curves that are based on predicted values from a first stage model with estimated parameters (such as a logit regression). The term "in-sample" refers to the practice of using the same data for model estimation (training) and subsequent evaluation, i.e., the construction of the ROC curve. We show that in this case the first stage estimation error has a generally non-negligible impact on the asymptotic distribution of the ROC curve and develop the appropriate pointwise and functional limit theory. We propose methods for simulating the distribution of the limit process and show how to use the results in practice in comparing ROC curves.
翻译:我们提供了一个全面的理论,根据第一阶段模型的预测值(如逻辑回归),对接收器操作特征(OC)曲线进行类比统计推断,这种预测值以具有估计参数(如逻辑回归)的第一阶段模型的预测值为基础。“内抽样”一词是指使用相同数据进行模型估计(培训)和随后评估(即构建ROC曲线)的做法。我们表明,在这种情况下,第一阶段估算错误对ROC曲线的无症状分布具有一般不可忽略的影响,我们提出了适当的点法和功能限制理论。我们提出了模拟限制过程分布的方法,并表明如何在比较ROC曲线时实际使用结果。