The information bottleneck (IB) method is a feasible defense solution against adversarial attacks in deep learning. However, this method suffers from the spurious correlation, which leads to the limitation of its further improvement of adversarial robustness. In this paper, we incorporate the causal inference into the IB framework to alleviate such a problem. Specifically, we divide the features obtained by the IB method into robust features (content information) and non-robust features (style information) via the instrumental variables to estimate the causal effects. With the utilization of such a framework, the influence of non-robust features could be mitigated to strengthen the adversarial robustness. We make an analysis of the effectiveness of our proposed method. The extensive experiments in MNIST, FashionMNIST, and CIFAR-10 show that our method exhibits the considerable robustness against multiple adversarial attacks. Our code would be released.
翻译:信息瓶颈法(IB)是针对深层学习中的对抗性攻击的可行的防御性解决办法,但这种方法具有虚假的相关性,因此限制了对对抗性强力的进一步改善。在本文件中,我们将因果推论纳入IB框架,以缓解这一问题。具体地说,我们将IB方法的特征分为强势特征(内容信息)和通过工具变量估计因果关系的非野蛮特征(风格信息)。利用这样一个框架,可以减轻非野蛮特征的影响,以加强对抗性强力。我们分析了我们拟议方法的有效性。MNIST、FashonMNIST和CIFAR-10的广泛实验表明,我们的方法显示了对多次对抗性攻击的相当强力。我们的代码将被释放。