Accurately predicting the 3D shape of any arbitrary object in any pose from a single image is a key goal of computer vision research. This is challenging as it requires a model to learn a representation that can infer both the visible and occluded portions of any object using a limited training set. A training set that covers all possible object shapes is inherently infeasible. Such learning-based approaches are inherently vulnerable to overfitting, and successfully implementing them is a function of both the architecture design and the training approach. We present an extensive investigation of factors specific to architecture design, training, experiment design, and evaluation that influence reconstruction performance and measurement. We show that our proposed SDFNet achieves state-of-the-art performance on seen and unseen shapes relative to existing methods GenRe and OccNet. We provide the first large-scale evaluation of single image shape reconstruction to unseen objects. The source code, data and trained models can be found on https://github.com/rehg-lab/3DShapeGen.
翻译:从单一图像中准确预测任意物体在任何表面上的三维形状是计算机视觉研究的一个关键目标。这具有挑战性,因为它需要一种模型来学习一种能用有限的训练集推断任何物体的可见和隐蔽部分的表示方式。涵盖所有可能的物体形状的培训装置本质上是行不通的。这种以学习为基础的方法在本质上容易过度适应,而且成功执行是建筑设计和培训方法的一个功能。我们广泛调查了影响重建业绩和测量的建筑设计、培训、实验设计和评价的具体因素。我们表明,我们提议的SDFNet在与GenRe和OccNet现有方法相比的视觉和看不见形状上取得了最先进的表现。我们第一次大规模评价了对看不见物体的单一图像形状的重建。源代码、数据和经过培训的模型可以在 https://github.com/rehg-lab/3DpeGen上找到。