Recent studies have shown that deep neural networks (DNNs) are vulnerable to various attacks, including evasion attacks and poisoning attacks. On the defense side, there have been intensive interests in provable robustness against evasion attacks. In this paper, we focus on improving model robustness against more diverse threat models. Specifically, we provide the first unified framework using smoothing functional to certify the model robustness against general adversarial attacks. In particular, we propose the first robust training process RAB to certify against backdoor attacks. We theoretically prove the robustness bound for machine learning models based on the RAB training process, analyze the tightness of the robustness bound, as well as proposing different smoothing noise distributions such as Gaussian and Uniform distributions. Moreover, we evaluate the certified robustness of a family of "smoothed" DNNs which are trained in a differentially private fashion. In addition, we theoretically show that for simpler models such as K-nearest neighbor models, it is possible to train the robust smoothed models efficiently. For K=1, we propose an exact algorithm to smooth the training process, eliminating the need to sample from a noise distribution.Empirically, we conduct comprehensive experiments for different machine learning models such as DNNs, differentially private DNNs, and KNN models on MNIST, CIFAR-10 and ImageNet datasets to provide the first benchmark for certified robustness against backdoor attacks. In particular, we also evaluate KNN models on a spambase tabular dataset to demonstrate its advantages. Both the theoretic analysis for certified model robustness against arbitrary backdoors, and the comprehensive benchmark on diverse ML models and datasets would shed light on further robust learning strategies against training time or even general adversarial attacks on ML models.
翻译:最近的研究显示,深心神经网络(DNNS)易受各种攻击,包括躲避攻击和中毒攻击。在国防方面,人们强烈关注对逃避攻击的可证实的稳健性。在本论文中,我们侧重于针对更多样化的威胁模式改进模型的稳健性。具体地说,我们提供了第一个统一框架,使用平滑功能来证明对一般对抗性攻击的典型稳健性。特别是,我们提议了第一个强大的培训程序RAB,以验证对后门攻击的稳健性。我们理论上证明,基于RAB培训过程的机器学习模型,分析稳健性约束的紧凑性,以及提出不同的全面稳健性声音分布,例如高斯和统一分布。此外,我们评估了“虚弱”的DNNNNNNNNNNNNNNNNNNND系列的认证强性。此外,我们理论上显示,对于较简单的KNNNFAR的模型来说,我们有可能对较强的光度模型进行更强的反向后级的模拟。我们提议一个精确的算算法,用来模拟到对训练过程的模型,从一个比KNNNFNNNR的模型, 的模拟的模拟的模拟的模拟的模拟数据, 和DNNNNNDRDRDD的模拟的模拟的模拟的模拟的模拟的模拟的模拟数据。