Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense side, there have been intensive efforts on improving both empirical and provable robustness against evasion attacks; however, the provable robustness against backdoor attacks still remains largely unexplored. In this paper, we focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We first provide a unified framework via randomized smoothing techniques and show how it can be instantiated to certify the robustness against both evasion and backdoor attacks. We then propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We prove the robustness bound for machine learning models trained with RAB and prove that our robustness bound is tight. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm that eliminates the need to sample from a noise distribution for such models. Empirically, we conduct comprehensive experiments for different machine learning (ML) models such as DNNs, support vector machines, and K-NN models on MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for certified robustness against backdoor attacks. In addition, we evaluate K-NN models on a spambase tabular dataset to demonstrate the advantages of the proposed exact algorithm. Both the theoretic analysis and the comprehensive evaluation on diverse ML models and datasets shed light on further robust learning strategies against general training time attacks.
翻译:最近的研究显示,深层神经网络(DNNS)很容易受到对抗性攻击,包括逃险和后门(潜伏)攻击。在国防方面,我们一直在大力改进对逃避攻击的经验和可证实的稳健性;然而,对后门攻击的可证实的稳健性在很大程度上仍未探索。在本文件中,我们的重点是证明机器学习模型的稳健性,以对付一般威胁模型,特别是后门攻击。我们首先通过随机化平滑技术提供一个统一框架,并展示如何即时验证对逃避和后门攻击的稳健性。然后,我们提议开展首次强健的培训进程,即RAB,以平滑性模型和后门攻击的稳健性;我们证明机器与RAB的机机学习模型的稳健健健性。我们理论上显示,有可能为KNFAR的首次模型培训稳健健健性模型,我们提议进行精确的模拟,我们提议进行精确的升级的算法,以排除对低噪音攻击的样品进行抽样分析,KNMLMAR数据,我们为不同的模型进行模拟的模拟学习。