图像网络上的 3D 一代</s> (3D generation on ImageNet)

Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location, and orientation, and the camera always points to the center of the scene. This makes them inapplicable to diverse, in-the-wild datasets of non-alignable scenes rendered from arbitrary camera poses. In this work, we develop a 3D generator with Generic Priors (3DGP): a 3D synthesis framework with more general assumptions about the training data, and show that it scales to very challenging datasets, like ImageNet. Our model is based on three new ideas. First, we incorporate an inaccurate off-the-shelf depth estimator into 3D GAN training via a special depth adaptation module to handle the imprecision. Then, we create a flexible camera model and a regularization strategy for it to learn its distribution parameters during training. Finally, we extend the recent ideas of transferring knowledge from pre-trained classifiers into GANs for patch-wise trained models by employing a simple distillation-based technique on top of the discriminator. It achieves more stable training than the existing methods and speeds up the convergence by at least 40%. We explore our model on four datasets: SDIP Dogs 256x256, SDIP Elephants 256x256, LSUN Horses 256x256, and ImageNet 256x256, and demonstrate that 3DGP outperforms the recent state-of-the-art in terms of both texture and geometry quality. Code and visualizations: https://snap-research.github.io/3dgp.

翻译：从 2D 生成的 3D 现有 3D 生成器通常设计为精密的单类数据集, 所有对象都具有( 约) 相同的规模、 3D 位置和方向, 以及相机总是指向场景中心。这使得它们无法适用于由任意相机配置的不匹配场景的多样化的、本地版数据集。在此工作中, 我们开发了一个 3D 生成器, 配有通用前科( 3DGP) : 一个 3D 合成框架, 包含对培训数据的更一般性假设, 并显示它以非常具有挑战性的数据集( 如图像Net) 。我们的模型基于三个新想法。首先, 我们通过一个特殊的深度适应模块模块, 将不准确的现底深处的测深处的测深点天线标值纳入 3D GAN 培训。然后, 我们为它创建了一个灵活的相机模型模型和常规化战略, 以学习它的分发参数。最后, 我们扩展了将知识从预训练过的分类器转换成 GAN- 和训练过的模型的GAN- 类似模型, 比如的模型, 如图像25 联合国模型, 联合国模型基于三个新概念的模型; 我们的模型的模型的模型的模型基于三个版的精度的精度技术, 25 最新的精度的精度的精度的精度, 最新精度, 最新技术, 在导的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨的轨法, 。</s>

相关内容

ImageNet (数据集)

关注 21

ImageNet项目是一个用于视觉对象识别软件研究的大型可视化数据库。超过1400万的图像URL被ImageNet手动注释，以指示图片中的对象;在至少一百万个图像中，还提供了边界框。ImageNet包含2万多个类别; [2]一个典型的类别，如“气球”或“草莓”，包含数百个图像。第三方图像URL的注释数据库可以直接从ImageNet免费获得;但是，实际的图像不属于ImageNet。自2010年以来，ImageNet项目每年举办一次软件比赛，即ImageNet大规模视觉识别挑战赛（ILSVRC），软件程序竞相正确分类检测物体和场景。 ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表。2012年在解决ImageNet挑战方面取得了巨大的突破，被广泛认为是2010年的深度学习革命的开始。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日