Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and LAION have propelled recent dramatic progress in AI. Large neural models trained on such datasets produce impressive results and top many of today's benchmarks. A notable omission within this family of large-scale datasets is 3D data. Despite considerable interest and potential applications in 3D vision, datasets of high-fidelity 3D models continue to be mid-sized with limited diversity of object categories. Addressing this gap, we present Objaverse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. Objaverse improves upon present day 3D repositories in terms of scale, number of categories, and in the visual diversity of instances within a category. We demonstrate the large potential of Objaverse via four diverse applications: training generative 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied AI, and creating a new benchmark for robustness analysis of vision models. Objaverse can open new directions for research and enable new applications across the field of AI.
翻译:WebText、Wikipedia、概念说明、WebImagetText和LAION等大规模数据囊体,推动了AI公司最近取得的显著进展。在这类数据集方面受过培训的大型神经模型产生了令人印象深刻的结果和今天的许多基准。在这个大家庭中,大型数据集的一个显著遗漏是3D数据。尽管对3D愿景有相当大的兴趣和潜在的应用,但高纤维3D模型的数据集仍然处于中等规模,对象类别的多样性有限。弥补这一差距,我们介绍了Objaversion 1.0,一个拥有800K+(和正在增长)3D模型的大型物体数据集,其中含有描述性说明性说明、标记和动画。Objaversion在今天的3D库中,在规模、类别数量和某类别中实例的视觉多样性方面都有改进。我们展示了Objaversion通过四种不同应用的巨大潜力:培训3D模型的基因化模型,改进LVIS基准的尾项分类,为Embo-vigle 对象导航模型的培训,为Ebodied AI提供公开的3D 目标导航模型,并为新的视野应用新方向制定新的基准。