Deep Neural Network (DNN) models have vulnerabilities related to security concerns, with attackers usually employing complex hacking techniques to expose their structures. Data poisoning-enabled perturbation attacks are complex adversarial ones that inject false data into models. They negatively impact the learning process, with no benefit to deeper networks, as they degrade a model's accuracy and convergence rates. In this paper, we propose an attack-agnostic-based defense method for mitigating their influence. In it, a Defensive Feature Layer (DFL) is integrated with a well-known DNN architecture which assists in neutralizing the effects of illegitimate perturbation samples in the feature space. To boost the robustness and trustworthiness of this method for correctly classifying attacked input samples, we regularize the hidden space of a trained model with a discriminative loss function called Polarized Contrastive Loss (PCL). It improves discrimination among samples in different classes and maintains the resemblance of those in the same class. Also, we integrate a DFL and PCL in a compact model for defending against data poisoning attacks. This method is trained and tested using the CIFAR-10 and MNIST datasets with data poisoning-enabled perturbation attacks, with the experimental results revealing its excellent performance compared with those of recent peer techniques.
翻译:深度神经网络(DNN)模型在安全问题上存在弱点,攻击者通常使用复杂的黑客技术暴露其结构。数据中毒引发的扰动攻击是复杂的对抗性攻击,将虚假数据输入模型,对学习过程产生消极影响,对更深的网络没有好处,因为它们会降低模型的精确度和趋同率。在本文中,我们提议了一种基于攻击的防御方法来减轻其影响。在这个模型中,防御性地貌层(DFL)与众所周知的DNNN(DF)结构相结合,该结构有助于在地貌空间中消除非法扰动样品的影响。为了提高这一对被攻击输入样品进行正确分类的方法的稳健性和可信赖性,我们规范了经过训练的模型的隐蔽空间,该模型具有歧视性损失功能,称为极化对比损失(PCL PCL ) 。它改进了不同类别样本之间的歧视,并保持了同类样本的相似性。此外,我们将DFL和PCL(PCL)结合一个保护数据中毒攻击的紧凑模型。这个方法是使用CIFAR-10和MISISI的实验性实验性实验性实验性数据,与最近测试结果。