用于图像附加布局生成的几何离强变动变换变换器 (Geometry Aligned Variational Transformer for Image-conditioned Layout Generation)

Layout generation is a novel task in computer vision, which combines the challenges in both object localization and aesthetic appraisal, widely used in advertisements, posters, and slides design. An accurate and pleasant layout should consider both the intra-domain relationship within layout elements and the inter-domain relationship between layout elements and the image. However, most previous methods simply focus on image-content-agnostic layout generation, without leveraging the complex visual information from the image. To this end, we explore a novel paradigm entitled image-conditioned layout generation, which aims to add text overlays to an image in a semantically coherent manner. Specifically, we propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image. First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images. Subsequently, we take them as building blocks of conditional variational autoencoder (CVAE), which demonstrates appealing diversity. Second, in order to alleviate the gap between layout elements domain and visual domain, we design a Geometry Alignment module, in which the geometric information of the image is aligned with the layout representation. In addition, we construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations. Experimental results show that our model can adaptively generate layouts in the non-intrusive area of the image, resulting in a harmonious layout design.

翻译：生成版式布局是计算机视觉中的一项新颖的任务,它结合了在广告、海报和幻灯片设计中广泛使用的物体定位和美学评估方面的挑战。准确和舒适的布局应该既考虑布局元素内部的内部关系, 也考虑布局元素和图像之间的内部关系。但是, 大多数先前的方法只是侧重于图像- 内容- 不可知的布局生成, 而没有利用图像的复杂视觉信息。为此, 我们探索了一种名为图像调整版式生成的新颖范例, 目的是以语义一致的方式将文字覆盖添加到图像上。具体地说, 我们提出一个图像调整型版式变异变变变( CVAE), 以缩小布局内不易变变的图像变变变变( CDVAE) 。其次, 为了缩小版式布局内不易变变变变的变变变变变变变变变变变变变变变变变变变变变变变变( ) 变变变变变变变变变变变变变变变变变变变( ) 变变变变变变变变变变变变变变变变变变变变变变变变变变变( 变变变变变变变变( 变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变( ) 变( ) 变变( ) 变( ) 变( ) ) 变( 变( ) 变( ) 变( ) 变( 变( ) ) 变变变变变变变变变( 变( 变( 变) 变( 变) 变( 变) 变) ) 变变变( ) 变( ) 变( 变( 变) 变变) 变更更更更更更更更更变变) 变更更更更变变变变变变变变变变变) 变变更更更更更变变变变变变变变