室内场景合成自动递减变异器 (ATISS: Autoregressive Transformers for Indoor Scene Synthesis)

The ability to synthesize realistic and diverse indoor furniture layouts automatically or based on partial input, unlocks many applications, from better interactive 3D tools to data synthesis for training and simulation. In this paper, we present ATISS, a novel autoregressive transformer architecture for creating diverse and plausible synthetic indoor environments, given only the room type and its floor plan. In contrast to prior work, which poses scene synthesis as sequence generation, our model generates rooms as unordered sets of objects. We argue that this formulation is more natural, as it makes ATISS generally useful beyond fully automatic room layout synthesis. For example, the same trained model can be used in interactive applications for general scene completion, partial room re-arrangement with any objects specified by the user, as well as object suggestions for any partial room. To enable this, our model leverages the permutation equivariance of the transformer when conditioning on the partial scene, and is trained to be permutation-invariant across object orderings. Our model is trained end-to-end as an autoregressive generative model using only labeled 3D bounding boxes as supervision. Evaluations on four room types in the 3D-FRONT dataset demonstrate that our model consistently generates plausible room layouts that are more realistic than existing methods. In addition, it has fewer parameters, is simpler to implement and train and runs up to 8 times faster than existing methods.

翻译：自动或以部分输入为基础综合现实和多样化室内家具布局的能力自动或以部分输入为基础,打开了许多应用程序,从更好的互动 3D 工具到用于培训和模拟的数据合成。在本文中,我们展示了ATRIS, 这是一种全新的自动递进式变异器结构, 用于创建多样化和可信的室内合成环境, 仅考虑到房间类型及其地板图。与先前的工作相比, 将现场合成作为序列生成, 我们的模型生成房间为未经排序的成套天体。我们争辩说, 这种配制比较自然, 因为它使得 ATISS 通常比完全自动布局合成更有用。例如, 相同的经过训练的模型可用于通用场景的交互式应用程序, 与用户指定的任何对象进行部分重新布局, 以及任何部分房间的天体建议。为了能够做到这一点, 我们的模型利用变异性器的变异性, 在部分场景上进行场景的组合中, 并被训练成未经排序的变异性。我们的终端到自动递增的基因化型模型, 仅使用标签 3D 绑定的框框作为监督, 部分房间的重新布局。为了更简单化的模型, 运行, 以更简单化的模型, 运行为更慢地展示, 更慢地展示, 更慢地展示到更慢地展示到更简单的方式, 运行到更慢地展示到更简单的方式。