We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects. Most complex scenes, natural or human-designed, can be expressed as a meaningful arrangement of simpler compositional graphical primitives. Generating a new layout or extending an existing layout requires understanding the relationships between these primitives. To do this, we propose LayoutTransformer, a novel framework that leverages self-attention to learn contextual relationships between layout elements and generate novel layouts in a given domain. Our framework allows us to generate a new layout either from an empty set or from an initial seed set of primitives, and can easily scale to support an arbitrary of primitives per layout. Furthermore, our analyses show that the model is able to automatically capture the semantic properties of the primitives. We propose simple improvements in both representation of layout primitives, as well as training methods to demonstrate competitive performance in very diverse data domains such as object bounding boxes in natural images(COCO bounding box), documents (PubLayNet), mobile applications (RICO dataset) as well as 3D shapes (Part-Net). Code and other materials will be made available at https://kampta.github.io/layout.
翻译:我们处理图像、移动应用程序、文档和 3D 对象等不同域的场景布局生成问题。 多数复杂的场景, 无论是自然的还是人类设计的, 都可以表现为更简单的构造图形原始的有意义的安排。 产生新的布局或扩展现有的布局需要理解这些原始体之间的关系。 为此, 我们提出布局 Transformed, 这是一个新颖的框架, 利用它来学习布局元素之间的背景关系, 并在给定域中生成新的布局。 我们的框架允许我们从一个空集或原始的原始种子组中生成一个新的布局, 并且可以轻松地规模支持每个布局的原始体的任意性。 此外, 我们的分析显示, 该模型能够自动捕捉原始体的语义特性。 我们建议简单改进布局原始体的表达方式, 以及培训方法, 以展示在非常多样化的数据领域, 如自然图像( CO捆绑框)、 文件( PubLayNet)、 移动应用程序( RICO 数据集) 以及 3D 形状( Part- Net) 。 代码和其他材料将可以在 http:// ampblayk) 上提供 。 。