We introduce PlatonicGAN to discover the 3D structure of an object class from an unstructured collection of 2D images, i.e., where no relation between photos is known, except that they are showing instances of the same category. The key idea is to train a deep neural network to generate 3D shapes which, when rendered to images, are indistinguishable from ground truth images (for a discriminator) under various camera poses. Discriminating 2D images instead of 3D shapes allows tapping into unstructured 2D photo collections instead of relying on curated (e.g., aligned, annotated, etc.) 3D data sets. To establish constraints between 2D image observation and their 3D interpretation, we suggest a family of rendering layers that are effectively differentiable. This family includes visual hull, absorption-only (akin to x-ray), and emission-absorption. We can successfully reconstruct 3D shapes from unstructured 2D images and extensively evaluate PlatonicGAN on a range of synthetic and real data sets achieving consistent improvements over baseline methods. We further show that PlatonicGAN can be combined with 3D supervision to improve on and in some cases even surpass the quality of 3D-supervised methods.
翻译:我们引入 PlatonicGAN 以从一个未结构化的 2D 图像收集中发现一个对象类的 3D 结构结构, 即照片之间没有任何关系, 但照片之间没有关系, 只有照片显示的是同一类别。 关键的想法是训练一个深神经网络, 生成 3D 形状, 当制作成图像时, 这些形状与各种相机的地面真实图像( 向导师) 无法区分。 分解 2D 图像, 而不是 3D 形状, 使得能够挖掘一个没有结构的 2D 照片收藏, 而不是依靠 3D 数据集 。 为了在 2D 图像观测和 3D 解释之间确立限制, 我们建议了一组可有效区别的 。 这个组合包括视觉体、 仅吸收性( 类似X光) 和 排放- 吸收性吸收性。 我们可以成功地从不结构化 2D 图像中重建 3D 形状, 并广泛评估 PlatoconGAN 的合成和真实数据集系列, 实现某些基线方法的一致的改进。 我们进一步显示, 3DOVIGAN 质量可以与3D 的3D 合成和3D 合成GAN 合并为3D 3D 。