Recent conditional image synthesis approaches provide high-quality synthesized images. However, it is still challenging to accurately adjust image contents such as the positions and orientations of objects, and synthesized images often have geometrically invalid contents. To provide users with rich controllability on synthesized images in the aspect of 3D geometry, we propose a novel approach to realistic-looking image synthesis based on a configurable 3D scene layout. Our approach takes a 3D scene with semantic class labels as input and trains a 3D scene painting network that synthesizes color values for the input 3D scene. With the trained painting network, realistic-looking images for the input 3D scene can be rendered and manipulated. To train the painting network without 3D color supervision, we exploit an off-the-shelf 2D semantic image synthesis method. In experiments, we show that our approach produces images with geometrically correct structures and supports geometric manipulation such as the change of the viewpoint and object poses as well as manipulation of the painting style.
翻译:最近有条件的图像合成方法提供了高质量的合成图像。 然而, 准确调整图像内容, 如对象的位置和方向, 而合成图像往往具有几何无效内容, 仍然具有挑战性。 为了在 3D 几何方面为用户提供对合成图像的丰富控制性, 我们提出一种新的方法, 以可配置的 3D 场景布局为基础, 进行现实的图像合成。 我们的方法采用3D 场景, 以语义类标签作为输入, 并训练一个 3D 场景画网, 以合成输入 3D 场景的颜色值。 有了训练有素的绘画网络, 3D 场景可以制作和操控现实的图像。 要在没有 3D 色彩监督的情况下对画网络进行培训, 我们利用一个离版的 2D 语义图像合成方法。 在实验中, 我们显示我们的方法产生图像时带有几何正确的结构, 并支持几何操纵, 如观点和对象构成的变化以及绘画风格的操控等 。