Reconstructing the 3D geometry of an object from an image is a major challenge in computer vision. Recently introduced differentiable renderers can be leveraged to learn the 3D geometry of objects from 2D images, but those approaches require additional supervision to enable the renderer to produce an output that can be compared to the input image. This can be scene information or constraints such as object silhouettes, uniform backgrounds, material, texture, and lighting. In this paper, we propose an approach that enables a differentiable rendering-based learning of 3D objects from images with backgrounds without the need for silhouette supervision. Instead of trying to render an image close to the input, we propose an adversarial style-transfer and domain adaptation pipeline that allows to translate the input image domain to the rendered image domain. This allows us to directly compare between a translated image and the differentiable rendering of a 3D object reconstruction in order to train the 3D object reconstruction network. We show that the approach learns 3D geometry from images with backgrounds and provides a better performance than constrained methods for single-view 3D object reconstruction on this task.
翻译:从图像中重建对象的 3D 几何是计算机视觉中的一大挑战。 最近引入的可变转换器可以被利用来学习 2D 图像对象的 3D 几何, 但是这些方法需要额外的监督, 使转换器能够生成一个输出, 可以与输入图像相比。 这可以是场景信息或制约, 如对象环形、 统一背景、 材料、 纹理和照明。 在本文中, 我们提出一种方法, 使来自背景的图像的 3D 对象能够以不同的方式进行基于图像的翻版学习, 而不需要光学监督 。 我们不试图让图像接近输入, 而是提议一个对抗式样传输和域适应管道, 以便能够将输入图像域转换到已设定的图像域 。 这样可以让我们直接比较被翻译的图像和 3D 对象重建的可变转换结果, 以便训练 3D 对象重建网络 。 我们显示, 这种方法从有背景的图像中学习 3D 几何方法, 并且提供了比单视图 3D 3D 对象 重建任务对象的功能更好的表现方法 。