Not all examples are created equal, but standard deep neural network training protocols treat each training point uniformly. Each example is propagated forward and backward through the network the same amount of times, independent of how much the example contributes to the learning protocol. Recent work has proposed ways to accelerate training by deviating from this uniform treatment. Popular methods entail up-weighting examples that contribute more to the loss with the intuition that examples with low loss have already been learned by the model, so their marginal value to the training procedure should be lower. This view assumes that updating the model with high loss examples will be beneficial to the model. However, this may not hold for noisy, real world data. In this paper, we theorize and then empirically demonstrate that loss-based acceleration methods degrade in scenarios with noisy and corrupted data. Our work suggests measures of example difficulty need to correctly separate out noise from other types of challenging examples.
翻译:并非所有例子都是平等的,但标准的深神经网络培训协议对每个训练点都一视同仁。每个例子都是通过网络向前和向后传播的,时间相同,不取决于该例子对学习协议的贡献程度。最近的工作提出了加快培训的方法,偏离了这种统一的处理方式。流行方法带来了更能增加损失的分量的例子,因为模型已经学会了低损失的例子,因此它们对于培训程序的边际价值应该更低。这种观点认为,用高损失的例子来更新模型对模型有好处。不过,这可能不会维持吵闹的、真实的世界数据。在本文中,我们用理论论解,然后从经验上证明,以损失为基础的加速方法在噪音和腐败的数据的情况下会退化。我们的工作提出了一些实例,需要正确地将噪音与其他类型的富有挑战性的例子区分开来。