Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground content. We propose LayoutDETR that inherits the high quality and realism from generative modeling, while reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal foreground elements in a layout. Our solution sets a new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ad banner dataset. We integrate our solution into a graphical system that facilitates user studies, and show that users prefer our designs over baselines by significant margins. Our code, models, dataset, graphical system, and demos are available at https://github.com/salesforce/LayoutDETR.
翻译:图形布局设计在视觉传达中扮演着至关重要的角色。然而,手工制作布局设计是需要技能的、耗时的,并且无法批量生产。生成模型被用来使设计自动化变得可扩展,但是生产符合设计师多模态需求、即在背景图像约束下、在前景内容驱动下的设计仍然不容易。我们提出了LayoutDETR,它继承了生成建模的高质量和真实性,同时将内容感知的要求重新定义为一个检测问题:我们学会在背景图像中检测多模态前景元素在布局中合理的位置、比例和空间关系。我们的解决方案在公共基准测试和我们新策划的广告横幅数据集上设定了新的最高性能水平。我们将我们的解决方案集成到了一个图形系统中,该系统可以促进用户研究,并显示用户相比于基准系统更喜欢我们的设计。我们的代码、模型、数据集、图形系统和演示在 https://github.com/salesforce/LayoutDETR 上提供。