Spatial control is a core capability in controllable image generation. Advancements in layout-guided image generation have shown promising results on in-distribution (ID) datasets with similar spatial configurations. However, it is unclear how these models perform when facing out-of-distribution (OOD) samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench, a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape. We benchmark two recent representative layout-guided image generation methods and observe that the good ID layout control may not generalize well to arbitrary layouts in the wild (e.g., objects at the boundary). Next, we propose IterInpaint, a new baseline that generates foreground and background regions in a step-by-step manner via inpainting, demonstrating stronger generalizability than existing models on OOD layouts in LayoutBench. We perform quantitative and qualitative evaluation and fine-grained analysis on the four LayoutBench skills to pinpoint the weaknesses of existing models. Lastly, we show comprehensive ablation studies on IterInpaint, including training task ratio, crop&paste vs. repaint, and generation order. Project website: https://layoutbench.github.io
翻译:空间控制是可控图像生成中的核心能力。布局引导图像生成的进展已经在相似空间配置的ID数据集上展示了有希望的结果。然而,目前还不清楚这些模型在面对任意未见过的布局的抽样外数据时的表现如何。在本文中,我们提出了布局基准(LayoutBench),这是一个针对布局引导图像生成的诊断基准,它检查了四个空间控制技能类别:数量、位置、大小和形状。我们对两个当前代表布局引导图像生成方法进行基准测试,并观察到,对于边界处的任意布局,良好的ID布局控制不一定能够很好地实现泛化。接下来,我们提出IterInpaint,这是一种通过修复生成前景和背景区域的新基准,通过分步方式展示了比现有模型更强的泛化能力,适用于布局基准中的OOD布局。我们对四个LayoutBench技能进行定量和定性评估和细致的分析,以确定现有模型的弱点。最后,我们展示了IterInpaint的全面消融研究,包括训练任务比例、裁剪和贴模式与重绘模式以及生成顺序等方面。项目网站:https://layoutbench.github.io