Generative models able to synthesize layouts of different kinds (e.g. documents, user interfaces or furniture arrangements) are a useful tool to aid design processes and as a first step in the generation of synthetic data, among other tasks. We exploit the properties of self-attention layers to capture high level relationships between elements in a layout, and use these as the building blocks of the well-known Variational Autoencoder (VAE) formulation. Our proposed Variational Transformer Network (VTN) is capable of learning margins, alignments and other global design rules without explicit supervision. Layouts sampled from our model have a high degree of resemblance to the training data, while demonstrating appealing diversity. In an extensive evaluation on publicly available benchmarks for different layout types VTNs achieve state-of-the-art diversity and perceptual quality. Additionally, we show the capabilities of this method as part of a document layout detection pipeline.
翻译:能够综合不同类型布局(如文件、用户界面或家具安排)的生成模型,是帮助设计过程的有用工具,也是生成合成数据的第一步,我们利用自我注意层的特性来捕捉布局各要素之间的高层次关系,并将这些特性作为众所周知的VAE(VAE)配方的构件。我们提议的变形变形器网络(VTN)能够在没有明确监督的情况下学习边距、校对和其他全球设计规则。从我们的模型中抽取的布局与培训数据非常相似,同时展示了具有吸引力的多样性。在广泛评价公开公布的不同布局类型VTN(VTN)的基准时,我们展示了该方法作为文件布局探测管道的一部分的能力。