This work explores the use of 3D generative models to synthesize training data for 3D vision tasks. The key requirements of the generative models are that the generated data should be photorealistic to match the real-world scenarios, and the corresponding 3D attributes should be aligned with given sampling labels. However, we find that the recent NeRF-based 3D GANs hardly meet the above requirements due to their designed generation pipeline and the lack of explicit 3D supervision. In this work, we propose Lift3D, an inverted 2D-to-3D generation framework to achieve the data generation objectives. Lift3D has several merits compared to prior methods: (1) Unlike previous 3D GANs that the output resolution is fixed after training, Lift3D can generalize to any camera intrinsic with higher resolution and photorealistic output. (2) By lifting well-disentangled 2D GAN to 3D object NeRF, Lift3D provides explicit 3D information of generated objects, thus offering accurate 3D annotations for downstream tasks. We evaluate the effectiveness of our framework by augmenting autonomous driving datasets. Experimental results demonstrate that our data generation framework can effectively improve the performance of 3D object detectors. Project page: https://len-li.github.io/lift3d-web.
翻译:本研究探讨使用3D生成模型来合成3D视觉任务的训练数据。生成模型的关键要求是生成的数据应该是逼真的,以匹配现实世界的场景,并且相对应的3D属性应该与给定的采样标签对齐。然而,我们发现最近的基于NeRF的3D GAN 由于其设计的生成管道和缺乏明确的3D监督,很难满足以上要求。在这项研究中,我们提出了Lift3D,一种将2D GAN 倒置至3D物体NeRF的生成框架,以实现数据生成目标。与之前的方法相比,Lift3D 具有几个优点:(1)不像之前的3D GAN,在训练后输出分辨率固定,Lift3D 可以留给任何具有更高分辨率和逼真输出的相机内参。(2)通过将2D GAN提升至3D物体NeRF的良好去耦合,Lift3D 为生成的物体提供了显式的3D信息,从而为下游任务提供准确的3D注释。我们通过增加自动驾驶数据集来评估我们框架的有效性。实验结果表明,我们的数据生成框架可以有效地改善3D目标检测器的性能。项目页面:https://len-li.github.io/lift3d-web。