We propose the first general PAC-Bayesian generalization bounds for adversarial robustness, that estimate, at test time, how much a model will be invariant to imperceptible perturbations in the input. Instead of deriving a worst-case analysis of the risk of a hypothesis over all the possible perturbations, we leverage the PAC-Bayesian framework to bound the averaged risk on the perturbations for majority votes (over the whole class of hypotheses). Our theoretically founded analysis has the advantage to provide general bounds (i) that are valid for any kind of attacks (i.e., the adversarial attacks), (ii) that are tight thanks to the PAC-Bayesian framework, (iii) that can be directly minimized during the learning phase to obtain a robust model on different attacks at test time.
翻译:我们提出第一个通用的PAC-Bayesian通用框架,以保持对抗性强力,在测试时估计一个模型在输入时会有多大的不易察觉到的干扰。 我们不但没有对所有可能的干扰都进行假设风险的最坏情况分析,而是利用PAC-Bayesian框架,将干扰多数票的平均风险(在整个假设类别中)捆绑起来。 我们的理论分析的优势在于提供(一) 对任何类型的攻击(即对抗性攻击)都有效的一般界限,(二) 由于PAC-Bayesian框架的缘故,这些界限很紧,(三) 在学习阶段可以直接降低这些风险,以便在试验时获得关于不同攻击的稳健模型。