Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes. But do diffusion models create unique works of art, or are they stealing content directly from their training sets? In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication. We also identify cases where diffusion models, including the popular Stable Diffusion model, blatantly copy from their training data.
翻译:尖端传播模型产生高质量和自定义性高的图像,使其能够用于商业艺术和图形设计目的。 但是,传播模型是创造独特的艺术作品,还是直接从培训组中窃取内容? 在这项工作中,我们研究图像检索框架,使我们能够将生成的图像与培训样本进行比较,并在内容复制后检测。我们利用我们的框架传播经过多套数据集培训的模型,包括牛津花卉、Celeb-A、图像网络和LAION,我们讨论了培训如何设定内容复制的大小影响率等因素。我们还查明了扩散模型,包括流行型稳定传播模型,从培训数据中明显复制。