训练无关的交叉注意力引导布局控制 (Training-Free Layout Control with Cross-Attention Guidance)

Recent diffusion-based generators can produce high-quality images based only on textual prompts. However, they do not correctly interpret instructions that specify the spatial layout of the composition. We propose a simple approach that can achieve robust layout control without requiring training or fine-tuning the image generator. Our technique, which we call layout guidance, manipulates the cross-attention layers that the model uses to interface textual and visual information and steers the reconstruction in the desired direction given, e.g., a user-specified layout. In order to determine how to best guide attention, we study the role of different attention maps when generating images and experiment with two alternative strategies, forward and backward guidance. We evaluate our method quantitatively and qualitatively with several experiments, validating its effectiveness. We further demonstrate its versatility by extending layout guidance to the task of editing the layout and context of a given real image.

翻译：最近基于扩散的生成器可以仅基于文本提示生成高质量的图像。然而，它们不能正确解释指定组成的空间布局的指令。我们提出了一种简单的方法，可以在不需要对图像生成器进行训练或微调的情况下实现稳健的布局控制。我们的技术称为布局引导，它操纵交叉注意力层，该层用于接口文本和视觉信息，并根据用户指定的布局在所需方向上引导重建。为了确定如何最有效地引导注意力，我们研究生成图像时不同注意力图的作用，并尝试两种替代策略，前向和后向引导。我们通过多个实验定量和定性地评估了我们的方法，验证了其有效性。我们通过将布局引导扩展到给定的实际图像的布局和上下文编辑任务来进一步展示其多功能性。

相关内容

生成器

关注 2

生成器是一次生成一个值的特殊类型函数。可以将其视为可恢复函数。调用该函数将返回一个可用于生成连续 x 值的生成【Generator】，简单的说就是在函数的执行过程中，yield语句会把你需要的值返回给调用生成器的地方，然后退出函数，下一次调用生成器函数的时候又从上次中断的地方开始执行，而生成器内的所有变量参数都会被保存下来供下一次使用。

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

专知会员服务

16+阅读 · 2022年3月29日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日