We propose the family of generalized resubstitution classifier error estimators based on empirical measures. These error estimators are computationally efficient and do not require re-training of classifiers. The plain resubstitution error estimator corresponds to choosing the standard empirical measure. Other choices of empirical measure lead to bolstered, posterior-probability, Gaussian-process, and Bayesian error estimators; in addition, we propose bolstered posterior-probability error estimators as a new family of generalized resubstitution estimators. In the two-class case, we show that a generalized resubstitution estimator is consistent and asymptotically unbiased, regardless of the distribution of the features and label, if the corresponding generalized empirical measure converges uniformly to the standard empirical measure and the classification rule has a finite VC dimension. A generalized resubstitution estimator typically has hyperparameters that can be tuned to control its bias and variance, which adds flexibility. Numerical experiments with various classification rules trained on synthetic data assess the thefinite-sample performance of several representative generalized resubstitution error estimators. In addition, results of an image classification experiment using the LeNet-5 convolutional neural network and the MNIST data set demonstrate the potential of this class of error estimators in deep learning for computer vision.
翻译:我们根据实证措施提出普遍替代分类误差估计值。 这些误差估计值是计算效率高的, 不需要对分类员进行再培训。 简单重置误差估计值与标准经验计量标准选择相对应。 其他经验计量选择导致支持性、 后置概率、 高斯进程和巴耶斯误差估计值; 此外, 我们提议支持后代概率误差估计值, 作为普遍重置估计值的新组合。 在二等例子中, 我们表明, 普遍重置估计值是一致的, 且不具有象征性的不偏向性, 不论特征和标签的分布如何。 如果相应的一般实测度测量值与标准实测度和分类规则一致, 则具有有限的VC层面。 普遍重置估计值通常具有超常校准度, 以控制其偏差和差异, 从而增加灵活性。 在二等例子中, 我们用经过过各种分类的重置估计值估计值估算值评估, 使用综合图象网络的模拟结果, 将一些模拟模型的模拟演算结果 。