Spatial control is a core capability in controllable image generation. Advancements in layout-guided image generation have shown promising results on in-distribution (ID) datasets with similar spatial configurations. However, it is unclear how these models perform when facing out-of-distribution (OOD) samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench, a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape. We benchmark two recent representative layout-guided image generation methods and observe that the good ID layout control may not generalize well to arbitrary layouts in the wild (e.g., objects at the boundary). Next, we propose IterInpaint, a new baseline that generates foreground and background regions in a step-by-step manner via inpainting, demonstrating stronger generalizability than existing models on OOD layouts in LayoutBench. We perform quantitative and qualitative evaluation and fine-grained analysis on the four LayoutBench skills to pinpoint the weaknesses of existing models. Lastly, we show comprehensive ablation studies on IterInpaint, including training task ratio, crop&paste vs. repaint, and generation order. Project website: https://layoutbench.github.io
翻译:控制图像的空间布局是可控制图像生成中的核心能力。基于布局引导的图像生成的进展已经在具有相似空间配置的分布内(ID)数据集上展现出了有希望的结果。然而,当面对任意、未见的布局的情况时,这些模型的表现情况并不明确。在本文中,我们提出了LayoutBench,这是一种用于布局引导的图像生成的诊断基准,检查了四个空间控制技能的类别:数量、位置、大小和形状。我们对两种最近的布局引导图像生成方法进行了基准测试,并观察到ID布局控制良好的模型的广义性可能不好,无法很好地适应野外任意布局(例如边界处的对象)的情况。接下来,我们提出了IterInpaint,这是一种新的基线,通过修复一步一步地生成前景和背景区域,展示出在LayoutBench的OOD布局上比现有模型更强的泛化能力。我们进行了四个LayoutBench技能的定量和定性评估,并对现有模型的弱点进行了细致的分析。最后,我们展示了IterInpaint的全面消融研究,包括训练任务比、裁剪和粘贴与重绘、生成顺序等。项目网站:https://layoutbench.github.io