Variational autoencoders (VAEs) have recently been shown to be vulnerable to adversarial attacks, wherein they are fooled into reconstructing a chosen target image. However, how to defend against such attacks remains an open problem. We make significant advances in addressing this issue by introducing methods for producing adversarially robust VAEs. Namely, we first demonstrate that methods proposed to obtain disentangled latent representations produce VAEs that are more robust to these attacks. However, this robustness comes at the cost of reducing the quality of the reconstructions. We ameliorate this by applying disentangling methods to hierarchical VAEs. The resulting models produce high-fidelity autoencoders that are also adversarially robust. We confirm their capabilities on several different datasets and with current state-of-the-art VAE adversarial attacks, and also show that they increase the robustness of downstream tasks to attack.
翻译:最近发现,变化式自动电解器(VAE)很容易受到对抗性攻击,他们被骗来重建选定的目标图像。然而,如何防范这种攻击仍然是一个尚未解决的问题。我们通过采用产生对抗性强的VAEs的方法,在解决这一问题上取得了显著进展。我们首先表明,为获得分解的潜伏表示而提出的方法产生了对这些攻击更为强大的VAE。然而,这种稳健性是以降低重建质量为代价的。我们通过对等级为VAEs采用脱钩的方法改进了这一点。由此产生的模型产生了高度不洁性自动电解器,这些自动电解器也具有很强的对抗性。我们确认它们在若干不同的数据集上的能力,以及目前VAE对抗性攻击的状态。我们还表明,它们提高了下游攻击任务的强度。