We introduce an approach for training Variational Autoencoders (VAEs) that are certifiably robust to adversarial attack. Specifically, we first derive actionable bounds on the minimal size of an input perturbation required to change a VAE's reconstruction by more than an allowed amount, with these bounds depending on certain key parameters such as the Lipschitz constants of the encoder and decoder. We then show how these parameters can be controlled, thereby providing a mechanism to ensure a priori that a VAE will attain a desired level of robustness. Moreover, we extend this to a complete practical approach for training such VAEs to ensure our criteria are met. Critically, our method allows one to specify a desired level of robustness upfront and then train a VAE that is guaranteed to achieve this robustness. We further demonstrate that these Lipschitz--constrained VAEs are more robust to attack than standard VAEs in practice.
翻译:我们引入了一种对对抗性攻击具有可证实的抗变自动电解码器(VAE)的培训方法。 具体地说,我们首先根据改变VAE重建所需的投入干扰最小尺寸得出可操作的界限,这种界限取决于某些关键参数,如编码器和解码器的Lipschitz常数。 然后我们展示了这些参数如何可以控制,从而提供了一种机制,确保VAE能够达到理想的稳健程度。 此外,我们将此扩大到了一种完整的实用方法,用于培训这种VAE,以确保达到我们的标准。 关键地是,我们的方法允许一个人在前方确定一个理想的稳健程度,然后培训一个保证达到这种稳健程度的VAE。 我们还进一步证明,这些受Lischitz约束的VAE比实际中标准VAE更能进行攻击。