VQ3D:学习3D-Awar 图像网络生成模型 (VQ3D: Learning a 3D-Aware Generative Model on ImageNet)

Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars. However, these models struggle on larger, more complex datasets. To model diverse and unconstrained image collections such as ImageNet, we present VQ3D, which introduces a NeRF-based decoder into a two-stage vector-quantized autoencoder. Our Stage 1 allows for the reconstruction of an input image and the ability to change the camera position around the image, and our Stage 2 allows for the generation of new 3D scenes. VQ3D is capable of generating and reconstructing 3D-aware images from the 1000-class ImageNet dataset of 1.2 million training images. We achieve an ImageNet generation FID score of 16.8, compared to 69.8 for the next best baseline method.

翻译：最近的工作表明,有可能对2D图像集的3D内容的3D内容基因化模型进行培训,这种模型来自与单个物体类别相对应的小型数据集,如人类脸部、动物脸部或汽车。然而,这些模型在更大、更复杂的数据集上挣扎。为模拟图像网等多样化和不受限制的图像集,我们介绍了VQ3D,该模型将基于NeRF的解码器引入一个两级矢量量化的自动编码器。我们的第一阶段允许重建一个输入图像,并能够改变图像周围的摄像头位置。我们的第二阶段允许生成新的3D场景。VQ3D能够从120万个培训图像的1000个级图像网数据集中生成和重建3D-awre图像。我们实现了图像网生成FID分数为16.8,而下一个最佳基线方法则为69.8。

相关内容

ImageNet (数据集)

关注 21

ImageNet项目是一个用于视觉对象识别软件研究的大型可视化数据库。超过1400万的图像URL被ImageNet手动注释，以指示图片中的对象;在至少一百万个图像中，还提供了边界框。ImageNet包含2万多个类别; [2]一个典型的类别，如“气球”或“草莓”，包含数百个图像。第三方图像URL的注释数据库可以直接从ImageNet免费获得;但是，实际的图像不属于ImageNet。自2010年以来，ImageNet项目每年举办一次软件比赛，即ImageNet大规模视觉识别挑战赛（ILSVRC），软件程序竞相正确分类检测物体和场景。 ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表。2012年在解决ImageNet挑战方面取得了巨大的突破，被广泛认为是2010年的深度学习革命的开始。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日