After initial release of a machine learning algorithm, the model can be fine-tuned by retraining on subsequently gathered data, adding newly discovered features, or more. Each modification introduces a risk of deteriorating performance and must be validated on a test dataset. It may not always be practical to assemble a new dataset for testing each modification, especially when most modifications are minor or are implemented in rapid succession. Recent works have shown how one can repeatedly test modifications on the same dataset and protect against overfitting by (i) discretizing test results along a grid and (ii) applying a Bonferroni correction to adjust for the total number of modifications considered by an adaptive developer. However, the standard Bonferroni correction is overly conservative when most modifications are beneficial and/or highly correlated. This work investigates more powerful approaches using alpha-recycling and sequentially-rejective graphical procedures (SRGPs). We introduce novel extensions that account for correlation between adaptively chosen algorithmic modifications. In empirical analyses, the SRGPs control the error rate of approving unacceptable modifications and approve a substantially higher number of beneficial modifications than previous approaches.
翻译:在机器学习算法初始发布后,该模型可以通过对随后收集的数据进行再培训、添加新发现的特征或更多内容进行微调。每处修改都存在性能恶化的风险,必须在测试数据集中验证。为测试每项修改,特别是当大多数修改是轻微的或迅速连续实施的时,收集新的数据集可能并不总是切合实际。最近的工作表明,人们如何反复测试同一数据集的修改,并保护人们不因下列原因而过度适应(i) 沿着网格将测试结果分解,以及(ii) 应用Bonferroni校正来调整适应适应性开发商考虑的修改总数。然而,如果大多数修改是有益和(或)高度关联,Bonferroni标准校正过于保守。这项工作调查了使用字母翻版和按顺序选择的图形程序(SRGPs)的更强有力的方法。我们引入了新的扩展,以考虑到适应性选择的算法修改之间的关联性。在经验分析中,SRGP控制了批准不可接受的修改的错误率,并批准比以往方法高得多的有益修改。