Image generation using diffusion can be controlled in multiple ways. In this paper, we systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations. We identify 5 different manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise. We analyze the corresponding parameters of these manipulations and the manipulation schedule. We show that some previous editing methods fit nicely into our framework. Particularly, we identified one specific configuration as a new type of control by manipulating the predicted noise, which can perform higher-quality edits than previous work for a variety of local and global edits.
翻译:图像生成可以通过多种方式进行控制。在本文中,我们对现代生成扩散网络的方程进行系统分析,提出了一个名为MDP的框架,以解释适当操作的设计空间。我们确定了5种不同的操作,包括中间潜在变量,条件嵌入,交叉注意图,指导和预测噪声。我们分析了这些操纵的相关参数和操纵时间表。我们展示了一些以前的编辑方法很好地适应了我们的框架。特别是我们确定了一种特定的配置作为一种通过操作预测噪声的新型控制,可以为各种本地和全局编辑执行高质量的编辑。