This paper proposes an ensemble learning model that is resistant to adversarial attacks. To build resilience, we introduced a training process where each member learns a radically distinct latent space. Member models are added one at a time to the ensemble. Simultaneously, the loss function is regulated by a reverse knowledge distillation, forcing the new member to learn different features and map to a latent space safely distanced from those of existing members. We assessed the security and performance of the proposed solution on image classification tasks using CIFAR10 and MNIST datasets and showed security and performance improvement compared to the state of the art defense methods.
翻译:本文提出了一个抵御对抗性攻击的全套学习模式。为了建立复原力,我们引入了一个培训过程,让每个成员学习一个截然不同的潜在空间。成员模型每次在组合中添加一个。同时,损失功能通过逆向知识蒸馏来调节,迫使新成员学习与现有成员相距安全距离的不同特征和地图,以安全的方式进入一个隐蔽的空间。我们利用CIFAR10和MNIST数据集评估了拟议图像分类任务解决方案的安全和执行情况,并展示了与先进防御方法相比,安全和性能的改善。