We introduce an approach for training Variational Autoencoders (VAEs) that are certifiably robust to adversarial attack. Specifically, we first derive actionable bounds on the minimal size of an input perturbation required to change a VAE's reconstruction by more than an allowed amount, with these bounds depending on certain key parameters such as the Lipschitz constants of the encoder and decoder. We then show how these parameters can be controlled, thereby providing a mechanism to ensure \textit{a priori} that a VAE will attain a desired level of robustness. Moreover, we extend this to a complete practical approach for training such VAEs to ensure our criteria are met. Critically, our method allows one to specify a desired level of robustness \emph{upfront} and then train a VAE that is guaranteed to achieve this robustness. We further demonstrate that these Lipschitz--constrained VAEs are more robust to attack than standard VAEs in practice.
翻译:我们引入了一种对对抗性攻击具有可证实的抗变自动编码器(VAE)的培训方法。 具体地说, 我们首先从改变 VAE 重建所需的投入扰动的最小尺寸上得出可操作的界限, 其数量超过允许的数量, 这些界限取决于某些关键参数, 如编码器和解码器的Lipschitz常数。 我们然后展示这些参数如何可以控制, 从而提供一个机制, 以确保 VAE 能够达到理想的强度。 此外, 我们将此扩展为一种完整的实用方法, 用于培训这种VAE 以确保达到我们的标准。 关键地说, 我们的方法允许一个人指定一个理想的稳健度, 然后再培训一个保证达到这种强健度的VAE 。 我们还进一步证明, 这些Lipschitz- Connected VAEs 要比实践中的标准VAEs 更强的攻击能力。