Multi-label classification, which predicts a set of labels for an input, has many applications. However, multiple recent studies showed that multi-label classification is vulnerable to adversarial examples. In particular, an attacker can manipulate the labels predicted by a multi-label classifier for an input via adding carefully crafted, human-imperceptible perturbation to it. Existing provable defenses for multi-class classification achieve sub-optimal provable robustness guarantees when generalized to multi-label classification. In this work, we propose MultiGuard, the first provably robust defense against adversarial examples to multi-label classification. Our MultiGuard leverages randomized smoothing, which is the state-of-the-art technique to build provably robust classifiers. Specifically, given an arbitrary multi-label classifier, our MultiGuard builds a smoothed multi-label classifier via adding random noise to the input. We consider isotropic Gaussian noise in this work. Our major theoretical contribution is that we show a certain number of ground truth labels of an input are provably in the set of labels predicted by our MultiGuard when the $\ell_2$-norm of the adversarial perturbation added to the input is bounded. Moreover, we design an algorithm to compute our provable robustness guarantees. Empirically, we evaluate our MultiGuard on VOC 2007, MS-COCO, and NUS-WIDE benchmark datasets. Our code is available at: \url{https://github.com/quwenjie/MultiGuard}
翻译:多标签分类预测了一组输入的标签 { 多标签分类, 有许多应用程序 。 但是, 多标签最近多项研究表明, 多标签分类很容易被对抗性分类。 特别是, 攻击者可以操纵多标签分类者预测的输入标签, 增加精心设计的、 人类难以察觉的扰动 。 多分类的现有可辨识的防守在向多标签分类普及时达到亚最佳可辨识的稳健性保障 。 在此工作中, 我们提议多标签, 多标签是针对多标签分类的对抗性范例的首个明显有力的防守。 我们的多标签利用随机化的平滑法, 即建立可辨识的精密分类器 。 具体地说, 多标签的多标签分级分类可以建立一个通畅通的多标签分类器, 在输入到多标签分类时, 我们的可辨性和可辨识调调调的调控的调调音调调调调调调调调调调调调调调调调调调调调调调调调调调。 我们现有的理论贡献我们的一些输入的地面事实标签标签, 在2007年的标签设计中, 我们的基调调调调调调调调调调调调调调调制的基调调调调调调调的基调时, 。