In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared to commonly applied performance estimation methodologies. Firstly, a learning curve offers a graphical overview of a learner. This overview assists in assessing the potential benefit of adding training samples and it provides a more complete comparison between learners than performance estimates at a fixed subsample size. Secondly, a learning curve facilitates in estimating the performance at the total sample size rather than a subsample size. Thirdly, Learn2Evaluate allows the computation of a theoretically justified and useful lower confidence bound. Furthermore, this bound may be tightened by performing a bias correction. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.
翻译:在高维预测环境中,要可靠地估计测试性能仍具有挑战性。为了应对这一挑战,将提出一个新的业绩估计框架。这个称为 " Learne2Evaluate " 的框架以学习曲线为基础,将光滑的单调曲线作为样本大小的函数来描述测试性能。 " Learne2Evaluate " 与通常采用的性能估计方法相比,具有若干优势。首先,学习曲线为学习者提供一个图形概览。这一概览有助于评估添加培训样本的潜在好处,它为学习者提供了比固定子样本大小的性能估计更完整的比较。第二,学习曲线有助于以总抽样大小而不是子样本大小来估计性能。第三,Lear2Evaluate允许计算理论上合理和有用的较低信任约束值。此外,进行偏差校正可能会加强这一约束。Lear2Evaluate的好处通过模拟研究和应用食谱数据来说明。