Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.
翻译:DALL-E 2, 图像成像和稳定传播等图像传播模型因其生成高质量合成图像的能力而引起极大关注。 在这项工作中,我们展示了扩散模型将个人图像从其培训数据中记住,并在一代人时将其释放。通过生成和过滤管道,我们从最先进的模型中提取了一千多个培训范例,从个人照片到商标公司标识等。我们还在各种环境下培训了数百个传播模型,以分析不同的模型和数据决定如何影响隐私。 总体而言,我们的结果显示,扩散模型比以前GANs等基因模型要少得多,而减轻这些脆弱性可能需要在保护隐私培训方面有新的进展。