Deep convolutional neural networks accurately classify a diverse range of natural images, but may be easily deceived when designed, imperceptible perturbations are embedded in the images. In this paper, we design a multi-pronged training, input transformation, and image ensemble system that is attack agnostic and not easily estimated. Our system incorporates two novel features. The first is a transformation layer that computes feature level polynomial kernels from class-level training data samples and iteratively updates input image copies at inference time based on their feature kernel differences to create an ensemble of transformed inputs. The second is a classification system that incorporates the prediction of the undefended network with a hard vote on the ensemble of filtered images. Our evaluations on the CIFAR10 dataset show our system improves the robustness of an undefended network against a variety of bounded and unbounded white-box attacks under different distance metrics, while sacrificing little accuracy on clean images. Against adaptive full-knowledge attackers creating end-to-end attacks, our system successfully augments the existing robustness of adversarially trained networks, for which our methods are most effectively applied.
翻译:深革命神经网络准确地分类了多种多样的自然图像, 但是在设计时很容易被欺骗, 图像中嵌入了无法察觉的扰动。 在本文中, 我们设计了一个多管齐下的训练、 输入转换和图像集合系统, 这种系统是攻击不可知的, 不易估计的。 我们的系统包含两个新特点 。 第一个系统是一个转换层, 它计算出不同级别培训数据样本和迭代更新输入图像的感应时间, 其特征内核差异, 从而产生一个变异的输入。 第二个是分类系统, 包含对未解的网络的预测, 对过滤图像的组合进行硬投票。 我们对 CIDAR10 数据集的评估显示, 我们的系统提高了一个未解析的网络的稳健性, 以不同距离的测量标准来对付各种受约束和无约束的白箱攻击, 同时在清洁图像上牺牲了很少的准确性。 相对于适应性的全面攻击者制造端端攻击, 我们的系统成功地加强了现有的对抗性网络的稳健性。