Data augmentation by incorporating cheap unlabeled data from multiple domains is a powerful way to improve prediction especially when there is limited labeled data. In this work, we investigate how adversarial robustness can be enhanced by leveraging out-of-domain unlabeled data. We demonstrate that for broad classes of distributions and classifiers, there exists a sample complexity gap between standard and robust classification. We quantify to what degree this gap can be bridged via leveraging unlabeled samples from a shifted domain by providing both upper and lower bounds. Moreover, we show settings where we achieve better adversarial robustness when the unlabeled data come from a shifted domain rather than the same domain as the labeled data. We also investigate how to leverage out-of-domain data when some structural information, such as sparsity, is shared between labeled and unlabeled domains. Experimentally, we augment two object recognition datasets (CIFAR-10 and SVHN) with easy to obtain and unlabeled out-of-domain data and demonstrate substantial improvement in the model's robustness against $\ell_\infty$ adversarial attacks on the original domain.
翻译:通过纳入来自多个域的廉价无标签数据来增加数据,是改进预测的有力方法,特别是在标签数据有限的情况下。 在这项工作中,我们调查如何通过利用外域无标签数据来增强对抗性强健性。 我们证明,对于分布和分类的广泛类别,在标准分类和稳健分类之间存在抽样复杂性差距。 我们通过提供上界和下界都提供未标签样本,来量化这一差距在多大程度上可以通过从已转移域利用未标签样本来弥合。 此外,我们显示了当未标签数据来自已转移域而不是与标签数据相同的域时,我们如何通过利用外域数据来提高对抗性强性。 我们还调查,当一些结构信息,例如松散性,在标签和未标签域之间共享时,如何利用外部数据。 实验中,我们增加了两个目标识别数据集(CIFAR-10和SVHN),容易获取和未标出外部域数据,并显示模型对美元/ell_infty对原始域的对抗性攻击的稳健度有了显著改善。