In this paper, we study the graphic layout generation problem of producing high-quality visual-textual presentation designs for given images. We note that image compositions, which contain not only global semantics but also spatial information, would largely affect layout results. Hence, we propose a deep generative model, dubbed as composition-aware graphic layout GAN (CGL-GAN), to synthesize layouts based on the global and spatial visual contents of input images. To obtain training images from images that already contain manually designed graphic layout data, previous work suggests masking design elements (e.g., texts and embellishments) as model inputs, which inevitably leaves hint of the ground truth. We study the misalignment between the training inputs (with hint masks) and test inputs (without masks), and design a novel domain alignment module (DAM) to narrow this gap. For training, we built a large-scale layout dataset which consists of 60,548 advertising posters with annotated layout information. To evaluate the generated layouts, we propose three novel metrics according to aesthetic intuitions. Through both quantitative and qualitative evaluations, we demonstrate that the proposed model can synthesize high-quality graphic layouts according to image compositions.
翻译:在本文中,我们研究了为给定图像制作高质量视觉-文字演示设计的图像布局生成问题。我们注意到,不仅包含全球语义,而且包含空间信息的图像构成将在很大程度上影响布局结果。因此,我们提议了一个深层的基因模型,称为“布局布局GAN”(CGL-GAN),以根据输入图像的全球和空间视觉内容合成布局。为了从已经包含人工设计的图形布局数据的图像中获取培训图像,先前的工作建议将设计要素(如文本和缩略图)作为模型输入,这不可避免地留下地面真相的提示。我们研究了培训投入(带有提示面罩)和测试投入(不含面具)之间的不匹配,并设计了一个新的域校正模块(DAM),以缩小这一差距。为了培训,我们建立了一个大型布局数据集,由60 548张带有附加说明的布局信息的广告海报组成。为了评价产生的布局,我们建议根据审美学直观进行三项新设计指标。我们通过定量和定性评价,通过图像的布局显示,我们展示了拟议模型的图像结构可合成的高质量。