We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations. While previous work has developed algorithmic approaches to attacking and defending VAEs, there remains a lack of formalization for what it means for a VAE to be robust. To address this, we develop a novel criterion for robustness in probabilistic models: $r$-robustness. We then use this to construct the first theoretical results for the robustness of VAEs, deriving margins in the input space for which we can provide guarantees about the resulting reconstruction. Informally, we are able to define a region within which any perturbation will produce a reconstruction that is similar to the original reconstruction. To support our analysis, we show that VAEs trained using disentangling methods not only score well under our robustness metrics, but that the reasons for this can be interpreted through our theoretical results.
翻译:我们逐渐理解变式自动电算器(VAE)对于对抗性攻击和其他输入干扰的稳健性。 虽然先前的工作已经制定了攻击和捍卫VAE的算法方法,但对于VAE的稳健性意味着什么,仍然缺乏正规化。为了解决这个问题,我们为概率模型的稳健性制定了一个新的标准:美元-紫色。然后,我们用它来构建VAE稳健性的第一个理论结果,在投入空间中产生边际,我们可以为由此产生的重建提供保证。非正式地说,我们能够界定一个区域,任何扰动都会产生类似于最初重建的重建。为了支持我们的分析,我们表明,使用不相干方法培训的VAE不仅在我们的稳健度度度测量值下得分很高,而且可以通过我们的理论结果来解释其原因。