Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks such as predicting 3D geometry from monocular photographs. To train high performing models, most of the current approaches rely on multi-view imagery which are not readily available in practice. Recent Generative Adversarial Networks (GANs) that synthesize images, in contrast, seem to acquire 3D knowledge implicitly during training: object viewpoints can be manipulated by simply manipulating the latent codes. However, these latent codes often lack further physical interpretation and thus GANs cannot easily be inverted to perform explicit 3D reasoning. In this paper, we aim to extract and disentangle 3D knowledge learned by generative models by utilizing differentiable renderers. Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties. The entire architecture is trained iteratively using cycle consistency losses. We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets, both quantitatively and via user studies. We further showcase the disentangled GAN as a controllable 3D "neural renderer", complementing traditional graphics renderers.
翻译:可区别的图像为培训神经网络以完成“反向图形”任务铺平了道路,例如从单镜照片中预测 3D 几何学等“反向图形”任务。为了培训高性能模型,大多数当前方法都依赖于实际中无法轻易获得的多视图图像。最近的General Adversarial网络(GANs)似乎在培训期间隐含地获得了3D知识:对象视角可以通过简单的操作潜伏代码来操纵。然而,这些潜伏代码往往缺乏进一步的物理解释,因此GANs无法轻易被倒转,以进行明确的 3D 推理。在本文件中,我们的目标是利用不同的翻版模型所学到的3D 3D 知识。我们的方法的关键是利用GANs作为多视图数据生成器来训练一个反向图形网络,使用现现现的可变的可变版版模型,以及经过训练的反面图形网络作为可解码的3D 属性。整个建筑经过反复的训练,使用循环一致性模型损失......我们通过经过培训的GAN 将现有的数据转换为透视像化的G- dlasmas 。