Denoising diffusion models (DDMs) have been drawing much attention for their appreciable sample quality and diversity. Despite their remarkable performance, DDMs remain black boxes on which further study is necessary to take a profound step. Motivated by this, we delve into the design of conventional U-shaped diffusion models. More specifically, we investigate the self-attention modules within these models through carefully designed experiments and explore their characteristics. In addition, inspired by the studies that substantiate the effectiveness of the guidance schemes, we present plug-and-play diffusion guidance, namely Self-Attention Guidance (SAG), that can drastically boost the performance of existing diffusion models. Our method, SAG, extracts the intermediate attention map from a diffusion model at every iteration and selects tokens above a certain attention score for masking and blurring to obtain a partially blurred input. Subsequently, we measure the dissimilarity between the predicted noises obtained from feeding the blurred and original input to the diffusion model and leverage it as guidance. With this guidance, we observe apparent improvements in a wide range of diffusion models, e.g., ADM, IDDPM, and Stable Diffusion, and show that the results further improve by combining our method with the conventional guidance scheme. We provide extensive ablation studies to verify our choices.
翻译:DDMS尽管表现出色,但仍是需要进一步研究的黑盒,以便采取深刻的步骤。为此,我们深思熟虑传统U形扩散模型的设计。更具体地说,我们通过精心设计的实验调查这些模型中的自我注意模块,并探索其特点。此外,在证实指导方案有效性的研究的启发下,我们提出了插座和播放传播指南,即自我注意指南,这可以极大地提升现有传播模型的性能。我们的方法、SAG、从每个迭代的传播模型中提取出中间关注图,并选择了超过一定关注度的标牌,以掩盖和模糊,从而获得部分模糊的投入。随后,我们测量从向传播模式提供模糊和原始投入的预测噪音之间的不相干之处,并将其作为指导。我们通过这一指导,观察到一系列广泛的传播模型的明显改进,例如,ADM、IDPM、IDMM和Stablal 校准方案。我们通过将常规方法与常规方法相结合,我们提供了广泛的分析结果。