While additional training data improves the robustness of deep neural networks against adversarial examples, it presents the challenge of curating a large number of specific real-world samples. We circumvent this challenge by using additional data from proxy distributions learned by state-of-the-art generative models. We first seek to formally understand the transfer of robustness from classifiers trained on proxy distributions to the real data distribution. We prove that the difference between the robustness of a classifier on the two distributions is upper bounded by the conditional Wasserstein distance between them. Motivated by our result, we next ask how to empirically select an appropriate generative model? We find that existing distance metrics, such as FID, fail to correctly determine the robustness transfer from proxy distributions. We propose a robust discrimination approach, which measures the distinguishability of synthetic and real samples under adversarial perturbations. Our approach accurately predicts the robustness transfer from different proxy distributions. After choosing a proxy distribution, the next question is which samples are most beneficial? We successfully optimize this selection by estimating the importance of each sample in robustness transfer. Finally, using our selection criterion for proxy distribution and individual samples, we curate a set of ten million most beneficial synthetic samples for robust training on the CIFAR-10 dataset. Using this set we improve robust accuracy by up to 7.5% and 6.7% in $\ell_{\infty}$ and $\ell_2$ threat model, and certified robust accuracy by 7.6% in $\ell_2$ threat model over baselines not using proxy distributions on the CIFAR-10 dataset.
翻译:额外的培训数据提高了深层神经网络在对抗性实例面前的稳健性,但它也提出了纠正大量具体真实世界样本的挑战。我们通过使用通过最先进的基因化模型所学到的代用分配方法获得的额外数据来规避这一挑战。我们首先寻求正式理解在代用分布方面受过培训的分类人员向真实数据分布的区别。我们证明,两种分配的分类人员是否稳健性之间的差别由条件性瓦瑟斯坦距离的高度约束。根据我们的结果,我们接下来要询问如何以经验方式选择一个适当的实实在在型价格模型?我们发现,如FID等现有的代用代用分配方法的代用分配方法无法正确确定是否稳健性。我们提出了一种强健的区别对待方法,用以衡量在对抗性分布中接受合成样本与真实性样本的区别性。我们的方法准确地预测了不同代用分配方法的稳健性。在选择一种代用分配方法之后,下一个问题是哪些样本最有益?我们成功地优化了这一选择,方法是通过估算每个采样在最稳性的美元模式中的重要性,即正价的正值转移。最后,我们用最可靠的代用可靠的代用最可靠的代用最稳性指标标准标准的代用一个测试标准标准来改进的代用一个测试模型的代用最可靠的指标的代用一个标准的代用标准。