Multiple intriguing problems hover in adversarial training, including robustness-accuracy trade-off, robust overfitting, and gradient masking, posing great challenges to both reliable evaluation and practical deployment. Here, we show that these problems share one common cause -- low quality samples in the dataset. We first identify an intrinsic property of the data called problematic score and then design controlled experiments to investigate its connections with these problems. Specifically, we find that when problematic data is removed, robust overfitting and gradient masking can be largely alleviated; and robustness-accuracy trade-off is more prominent for a dataset containing highly problematic data. These observations not only verify our intuition about data quality but also open new opportunities to advance adversarial training. Remarkably, simply removing problematic data from adversarial training, while making the training set smaller, yields better robustness consistently with different adversary settings, training methods, and neural architectures.
翻译:在对抗性培训中,存在许多令人感兴趣的问题,包括稳健的准确性交易、稳健的过度装配和梯度掩码,这对可靠的评估和实际部署都构成巨大挑战。在这里,我们证明这些问题有一个共同的原因,即数据集中的低质量样本。我们首先确定数据中被称为问题评分的内在属性,然后设计受控的实验来调查数据与这些问题的联系。具体地说,我们发现,当有问题的数据被删除时,稳健的过度装配和梯度掩码可以大大缓解;对于包含高度问题数据的数据集来说,稳健的准确性交易更为突出。 这些观察不仅验证了我们对数据质量的直觉,而且还为推进对抗性培训开辟了新的机会。 显而易见的是,仅仅从对抗性培训中去除了有问题的数据,同时缩小了培训组的体积,与不同的对手环境、培训方法和神经结构保持了更稳健的状态。