We investigate the classification performance of K-nearest neighbors (K-NN) and deep neural networks (DNNs) in the presence of label noise. We first show empirically that a DNN's prediction for a given test example depends on the labels of the training examples in its local neighborhood. This motivates us to derive a realizable analytic expression that approximates the multi-class K-NN classification error in the presence of label noise, which is of independent importance. We then suggest that the expression for K-NN may serve as a first-order approximation for the DNN error. Finally, we demonstrate empirically the proximity of the developed expression to the observed performance of K-NN and DNN classifiers. Our result may explain the already observed surprising resistance of DNN to some types of label noise. It also characterizes an important factor of it showing that the more concentrated the noise the greater is the degradation in performance.
翻译:我们调查K近邻(K-NN)和深神经网络(DNN)在标签噪音面前的分类性能。我们首先从经验上表明,DNN对某个特定测试示例的预测取决于当地社区培训实例的标签。这促使我们得出一个可实现的分析表达方式,该表达方式在标签噪音面前接近多级K-NN分类错误,这是一个独立的重要性。我们然后建议,K-NN的表达方式可以作为DNN错误的第一阶近似。最后,我们从经验上表明,开发的表达方式与K-NNN和DNN分类人员观察到的表现相近。我们的结果可以解释已经观察到的DNN对某类标签噪音的惊人阻力。它还说明了一个重要因素,即声音越集中,表现越差。