Traditional statistics forbids use of test data (a.k.a. holdout data) during training. Dwork et al. 2015 pointed out that current practices in machine learning, whereby researchers build upon each other's models, copying hyperparameters and even computer code -- amounts to implicitly training on the test set. Thus error rate on test data may not reflect the true population error. This observation initiated {\em adaptive data analysis}, which provides evaluation mechanisms with guaranteed upper bounds on this difference. With statistical query (i.e. test accuracy) feedbacks, the best upper bound is fairly pessimistic: the deviation can hit a practically vacuous value if the number of models tested is quadratic in the size of the test set. In this work, we present a simple new estimate, {\em Rip van Winkle's Razor}. It relies upon a new notion of \textquotedblleft information content\textquotedblright\ of a model: the amount of information that would have to be provided to an expert referee who is intimately familiar with the field and relevant science/math, and who has been just been woken up after falling asleep at the moment of the creation of the test data (like \textquotedblleft Rip van Winkle\textquotedblright\ of the famous fairy tale). This notion of information content is used to provide an estimate of the above deviation which is shown to be non-vacuous in many modern settings.
翻译:传统统计数据不允许在培训期间使用测试数据( a.k.a.a. holdout 数据) 。 Dwork 等人(2015年) 指出,目前在机器学习方面的做法,即研究人员利用彼此的模型、复制超参数甚至计算机代码 -- -- 相当于对测试集进行隐含的培训。因此,测试数据中的错误率可能无法反映真正的人口误差。这种观察为评估机制提供了一个关于这一差异的有保障的现代上限。在统计查询(即测试准确性)反馈方面,最好的上层界限是相当悲观的:如果所测试的模型的数量在测试集的大小中具有四分性,则偏离实际上会打乱价值。在这项工作中,我们提出了一个简单的新估计,即 em Rip van Winkle's Razor} 。 它依赖于一种新的概念, 即: textclocktblebleflef 内容\ textclight\ textbrightright of a model preview files ride: rippled the more droppleglegn droppled das the froppled dal droppled droppled dropplem droppled the mated the produft drippled drippled the s pripplem dism frip rippled the mated the prip rip ropple rift roppled roft rift led dismationd roft roft roft roppled roft roft roft roft rogledd rod rod rod rifle rift rift roft roft roft roft roft roft leddddddddddddd roppd roppd roppd roppd roppd roppd rod rod rod ro) rod rod rod rodddd rod rodddd rod ro rod ro