In this paper, the bias classifier is introduced, that is, the bias part of a DNN with Relu as the activation function is used as a classifier. The work is motivated by the fact that the bias part is a piecewise constant function with zero gradient and hence cannot be directly attacked by gradient-based methods to generate adversaries, such as FGSM. The existence of the bias classifier is proved and an effective training method for the bias classifier is given. It is proved that by adding a proper random first-degree part to the bias classifier, an information-theoretically safe classifier against the original-model gradient attack is obtained in the sense that the attack will generate a totally random attacking direction. This seems to be the first time that the concept of information-theoretically safe classifier is proposed. Several attack methods for the bias classifier are proposed and numerical experiments are used to show that the bias classifier is more robust than DNNs with similar size against these attacks in most cases.
翻译:本文引入了偏差分类器, 即, 偏差分类器的偏差部分, 因为激活函数使用 Relu 作为分类器。 这项工作的动机是偏差部分是一个零梯度的片断常态函数, 因此不能直接受到基于梯度的方法( 如密克罗尼西亚州) 的打击来生成对手。 偏差分类器的存在得到证明, 偏差分类器的有效培训方法也得到了提供。 事实证明, 在偏差分类器中添加一个适当的随机第一度部分, 一个针对原始模型梯度攻击的信息- 理论安全分类器可以获取, 其含义是攻击将产生一个完全随机的攻击方向。 这似乎是第一次提出信息理论安全分类器概念。 提出了偏差分类器的几种攻击方法, 并使用数字实验来显示, 偏差分类器比大多数情况下对此类攻击具有类似尺寸的DNNs更强大。