Deep Generative Models (DGMs) are a popular class of deep learning models which find widespread use because of their ability to synthesize data from complex, high-dimensional manifolds. However, even with their increasing industrial adoption, they haven't been subject to rigorous security and privacy analysis. In this work we examine one such aspect, namely backdoor attacks on DGMs which can significantly limit the applicability of pre-trained models within a model supply chain and at the very least cause massive reputation damage for companies outsourcing DGMs form third parties. While similar attacks scenarios have been studied in the context of classical prediction models, their manifestation in DGMs hasn't received the same attention. To this end we propose novel training-time attacks which result in corrupted DGMs that synthesize regular data under normal operations and designated target outputs for inputs sampled from a trigger distribution. These attacks are based on an adversarial loss function that combines the dual objectives of attack stealth and fidelity. We systematically analyze these attacks, and show their effectiveness for a variety of approaches like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as well as different data domains including images and audio. Our experiments show that - even for large-scale industry-grade DGMs (like StyleGAN) - our attacks can be mounted with only modest computational effort. We also motivate suitable defenses based on static/dynamic model and output inspections, demonstrate their usefulness, and prescribe a practical and comprehensive defense strategy that paves the way for safe usage of DGMs.
翻译:深层生成模型(DGM)是一批广受欢迎的深层次学习模型(DGM),由于它们能够综合来自复杂、高维的多层数据,因此被广泛使用。然而,即使它们日益被工业采用,它们也没有受到严格的安全和隐私分析。在这项工作中,我们检查了一个这样的方面,即对DGM的幕后攻击,这可以大大限制在模范供应链中预先培训的模型的适用性,至少对外包DGMs的第三方公司造成巨大的声誉损害。虽然在古典预测模型中研究过类似的实际攻击情景,但它们在DGMs中的表现却没有得到同样的关注。为此,我们提出了新的培训时间攻击,导致DGMs在正常操作中将常规数据综合起来,并为从触发分布中抽样的投入指定目标产出。这些攻击是以对抗性损失功能为基础,将攻击的隐形和真实性双重目标结合起来。我们系统分析这些攻击,并展示了它们对于诸如General Aversarial网络(GANs)和Variational-eal Agencial-egrational devidustrational destrations (Val Gal-de) viewal deview Stal destrational destrational destration ims) 以及我们的大规模和制动动动动动动动的图像(Val-destrational-de)战略,可以展示我们的磁能/deal-