在Gaussian Mixture 模型攻击下 $\ ell_0$ 的强力分类 (Robust Classification Under $\ell_0$ Attack for the Gaussian Mixture Model)

It is well-known that machine learning models are vulnerable to small but cleverly-designed adversarial perturbations that can cause misclassification. While there has been major progress in designing attacks and defenses for various adversarial settings, many fundamental and theoretical problems are yet to be resolved. In this paper, we consider classification in the presence of $\ell_0$-bounded adversarial perturbations, a.k.a. sparse attacks. This setting is significantly different from other $\ell_p$-adversarial settings, with $p\geq 1$, as the $\ell_0$-ball is non-convex and highly non-smooth. Under the assumption that data is distributed according to the Gaussian mixture model, our goal is to characterize the optimal robust classifier and the corresponding robust classification error as well as a variety of trade-offs between robustness, accuracy, and the adversary's budget. To this end, we develop a novel classification algorithm called FilTrun that has two main modules: Filtration and Truncation. The key idea of our method is to first filter out the non-robust coordinates of the input and then apply a carefully-designed truncated inner product for classification. By analyzing the performance of FilTrun, we derive an upper bound on the optimal robust classification error. We also find a lower bound by designing a specific adversarial strategy that enables us to derive the corresponding robust classifier and its achieved error. For the case that the covariance matrix of the Gaussian mixtures is diagonal, we show that as the input's dimension gets large, the upper and lower bounds converge; i.e. we characterize the asymptotically-optimal robust classifier. Throughout, we discuss several examples that illustrate interesting behaviors such as the existence of a phase transition for adversary's budget determining whether the effect of adversarial perturbation can be fully neutralized.

翻译：众所周知, 机器学习模式很容易受到小型但巧妙设计的对称性扭曲, 可能导致错误分类。虽然在设计攻击和防御各种敌对环境方面已经取得了很大进展, 但许多根本性和理论问题仍有待解决。在本文中, 我们考虑在存在 $\ ell_ 0$ 的对称扭曲的情况下进行分类, a. k. a. a. a. k. a. dreak 攻击。此设置与其它 $\ ell_ p$ p$- 对抗性设置有很大不同, 有 $\\ geq 1 的对称性对称性扭曲, 因为 $\ ell_ 0$ ball 是非直观的, 并且非常稳性对称的对称性波动。我们的方法的关键理念是, 在根据高压混合混合的混合模型分配数据的情况下, 将最佳的对称性分类和相应的对称性分类错误描述为强性对称的对称的对称性交易。我们的对称性算算算算法可以找到两个主要模块: 调和调度的对等式的对称。。我们的方法的关键理念是先对调的对立性对立式的对立式的对立式的对立式的对立式的对立式的对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立, 。