We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor, providing robustness without compromising natural accuracy in both white-box and black-box settings. This hardware co-processor performs a nonlinear fixed random transformation, where the parameters are unknown and impossible to retrieve with sufficient precision for large enough dimensions. In the white-box setting, our defense works by obfuscating the parameters of the random projection. Unlike other defenses relying on obfuscated gradients, we find we are unable to build a reliable backward differentiable approximation for obfuscated parameters. Moreover, while our model reaches a good natural accuracy with a hybrid backpropagation - synthetic gradient method, the same approach is suboptimal if employed to generate adversarial examples. We find the combination of a random projection and binarization in the optical system also improves robustness against various types of black-box attacks. Finally, our hybrid training method builds robust features against transfer attacks. We demonstrate our approach on a VGG-like architecture, placing the defense on top of the convolutional features, on CIFAR-10 and CIFAR-100. Code is available at https://github.com/lightonai/adversarial-robustness-by-design.
翻译:我们提出一个新的防御机制,以抵御由光学共处理器引发的对抗性攻击,在不损及白箱和黑箱设置的自然精确性的情况下,提供稳健性,同时不破坏白箱和黑箱的自然精确性。这个硬件共处理器进行非线性固定随机转换,其参数未知,无法以足够精确的方式获取足够大尺寸的参数。在白箱设置中,我们通过模糊随机投影参数来进行防御工作。与其他依靠模糊的梯度的防御不同,我们发现我们无法为模糊的参数建立一个可靠的后向差异近似点。此外,虽然我们的模型以混合反向反向调整方法(合成梯度方法)达到良好的自然准确性,但如果使用该方法来生成对抗性实例,则不尽善。我们发现光学系统中随机投影和二元化的结合也提高了对各种黑箱攻击的稳健性。最后,我们的混合培训方法建立了抵御转移攻击的强性特征。我们在VGGG-类似结构上展示了我们的方法,将防御置于革命性特征的顶端上,即CIFAR-10和CIFAR-labas-labisal-laftyal-compraylation。