We introduce active testing: a new framework for sample-efficient model evaluation. While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of labeling test data, typically unrealistically assuming large test sets for model evaluation. This creates a disconnect to real applications where test labels are important and just as expensive, e.g. for optimizing hyperparameters. Active testing addresses this by carefully selecting the test points to label, ensuring model evaluation is sample-efficient. To this end, we derive theoretically-grounded and intuitive acquisition strategies that are specifically tailored to the goals of active testing, noting these are distinct to those of active learning. Actively selecting labels introduces a bias; we show how to remove that bias while reducing the variance of the estimator at the same time. Active testing is easy to implement, effective, and can be applied to any supervised machine learning method. We demonstrate this on models including WideResNet and Gaussian processes on datasets including CIFAR-100.
翻译:我们引入了积极的测试:一个样本高效模型评估的新框架。积极学习等方法减少了模型培训所需的标签数量,而现有文献则基本上忽略了标签测试数据的成本,通常不切实际地假设模型评估需要大型测试组。这导致与测试标签重要且同样昂贵的真正应用程序脱节,例如用于优化超参数。积极测试通过仔细选择标签测试点来解决这个问题,确保模型评估是高效的。为此,我们得出了专门为积极测试目标量身定制的理论基础和直觉获取战略,注意到这些战略与积极学习的战略不同。积极选择标签会引入一种偏差;我们展示如何消除这种偏差,同时减少估计值的差异。积极测试容易实施、有效,并可用于任何受监督的机器学习方法。我们用模型来展示这一点,包括在包括CIFAR-100在内的数据集上的宽ResNet和Gaussian进程。