Novel view synthesis from a single image has recently achieved remarkable results, although the requirement of some form of 3D, pose, or multi-view supervision at training time limits the deployment in real scenarios. This work aims at relaxing these assumptions enabling training of conditional generative models for novel view synthesis in a completely unsupervised manner. We first pre-train a purely generative decoder model using a 3D-aware GAN formulation while at the same time train an encoder network to invert the mapping from latent space to images. Then, we swap encoder and decoder and train the network as a conditioned GAN with a mixture of an autoencoder-like objective and self-distillation. At test time, given a view of an object, our model first embeds the image content in a latent code and regresses its pose, then generates novel views of it by keeping the code fixed and varying the pose. We test our framework on both synthetic datasets such as ShapeNet and on unconstrained collections of natural images, where no competing methods can be trained.
翻译:从单一图像中生成的纯基因解码模型最近取得了显著成果,尽管在培训时间上要求某种形式的 3D 、 显示或多视图监督限制了实际情景的部署。 这项工作旨在放松这些假设,以便能够以完全不受监督的方式培训用于新视图合成的有条件基因化模型。 我们首先使用 3D-aware GAN 配方对纯基因解码模型进行预演, 同时训练一个编码网络, 将映射从潜空向图像。 然后, 我们交换编码和解码器, 将网络训练成一个条件化的GAN, 结合一个自动编码器相似的目标和自我蒸馏的混合。 在测试时, 我们的模型首先将图像内容嵌入一个潜在的代码, 并反射其布局, 然后通过固定代码和改变布局, 来生成对它的新观点。 我们测试我们的框架, 包括像 ShapeNet 这样的合成数据集, 以及未经整合的自然图像收藏, 在那里无法训练任何相互竞争的方法 。