Neural rendering techniques promise efficient photo-realistic image synthesis while at the same time providing rich control over scene parameters by learning the physical image formation process. While several supervised methods have been proposed for this task, acquiring a dataset of images with accurately aligned 3D models is very difficult. The main contribution of this work is to lift this restriction by training a neural rendering algorithm from unpaired data. More specifically, we propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties. In contrast to a traditional graphics pipeline, our approach does not require to specify all scene properties, such as material parameters and lighting by hand. Instead, we learn photo-realistic deferred rendering from a small set of 3D models and a larger set of unaligned real images, both of which are easy to acquire in practice. Simultaneously, we obtain accurate intrinsic decompositions of real images while not requiring paired ground truth. Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
翻译:神经成像技术可以带来高效的摄影现实图像合成,同时通过学习物理成像过程提供对现场参数的丰富控制。 虽然为这项任务提出了几种监督方法, 但获得一组精确对齐的3D模型的图像非常困难。 这项工作的主要贡献是通过从无对称数据中培训神经成像算法来取消这一限制。 更具体地说, 我们提议一个自动编码器, 用于从合成的 3D 模型中联合生成现实图像, 同时将真实图像分解成其内在形状和外观特性。 与传统的图形管道相比, 我们的方法并不要求指定所有场景属性, 如材料参数和手工照明。 相反, 我们从一组小的 3D 模型和 更多不匹配的真实图像中学习了照片现实化的延迟映像, 两者在实践中都很容易获得。 同时, 我们获得了真实图像的精确内在解剖面, 而不需要对称地面真象。 我们的实验证实, 联合处理成像和分解成真像确实有益, 我们的方法超越了质量和定量的状态图像转换基线。