Machine learning models are known to be susceptible to adversarial attacks which can cause misclassification by introducing small but well designed perturbations. In this paper, we consider a classical hypothesis testing problem in order to develop fundamental insight into defending against such adversarial perturbations. We interpret an adversarial perturbation as a nuisance parameter, and propose a defense based on applying the generalized likelihood ratio test (GLRT) to the resulting composite hypothesis testing problem, jointly estimating the class of interest and the adversarial perturbation. While the GLRT approach is applicable to general multi-class hypothesis testing, we first evaluate it for binary hypothesis testing in white Gaussian noise under $\ell_{\infty}$ norm-bounded adversarial perturbations, for which a known minimax defense optimizing for the worst-case attack provides a benchmark. We derive the worst-case attack for the GLRT defense, and show that its asymptotic performance (as the dimension of the data increases) approaches that of the minimax defense. For non-asymptotic regimes, we show via simulations that the GLRT defense is competitive with the minimax approach under the worst-case attack, while yielding a better robustness-accuracy tradeoff under weaker attacks. We also illustrate the GLRT approach for a multi-class hypothesis testing problem, for which a minimax strategy is not known, evaluating its performance under both noise-agnostic and noise-aware adversarial settings, by providing a method to find optimal noise-aware attacks, and heuristics to find noise-agnostic attacks that are close to optimal in the high SNR regime.
翻译:已知机器学习模型容易受到对抗性攻击,这种攻击可能会通过引入小型但设计良好的扰动来导致错误分类。 在本文中,我们考虑一个古典假设测试问题,以发展对防御这种对抗性扰动的根本洞察力。 我们将对抗性扰动视为一种烦扰参数,并提议以对由此产生的复合假设测试问题适用普遍概率比重测试(GLRT)为基础进行辩护,共同估计利息等级和对抗性扰动。虽然GLRT方法适用于一般多级假设测试,但我们首先评估它是否适用于白高山噪音的二进制假设测试,以美元为基准进行防御。我们把一个已知的对最坏攻击最坏的防扰动性防御作为基准。我们从GLRT防御中得出最坏的情况攻击,并显示其微量性(随着已知的数据量的增加)性能接近微量级防御。对于非防腐蚀性攻击制度而言,我们通过模拟最坏的GLRT-R战略在最坏性攻击下展示一种最强性战略。