There is a rising interest in studying the robustness of deep neural network classifiers against adversaries, with both advanced attack and defence techniques being actively developed. However, most recent work focuses on discriminative classifiers, which only model the conditional distribution of the labels given the inputs. In this paper we propose the deep Bayes classifier, which improves classical naive Bayes with conditional deep generative models. We further develop detection methods for adversarial examples, which reject inputs that have negative log-likelihood under the generative model exceeding a threshold pre-specified using training data. Experimental results suggest that deep Bayes classifiers are more robust than deep discriminative classifiers, and the proposed detection methods achieve high detection rates against many recently proposed attacks.
翻译:人们越来越有兴趣研究深层神经网络分类器对对手的坚固性,正在积极开发先进的攻击和防御技术;然而,最近的工作重点是歧视分类器,这些分类器只对输入的标签的有条件分布进行示范;在本文件中,我们提议了深贝斯分类器,用条件深厚的基因化模型改进古老的天真贝氏;我们进一步开发了对抗性实例的检测方法,这些例子拒绝在使用培训数据进行基因化模型下具有负日志相似性超过预定阈值的投入。 实验结果表明,深贝斯分类器比深歧视分类器更强大,而拟议的探测方法对最近提出的许多攻击达到了高探测率。