We present a novel 3D shape reconstruction method which learns to predict an implicit 3D shape representation from a single RGB image. Our approach uses a set of single-view images of multiple object categories without viewpoint annotation, forcing the model to learn across multiple object categories without 3D supervision. To facilitate learning with such minimal supervision, we use category labels to guide shape learning with a novel categorical metric learning approach. We also utilize adversarial and viewpoint regularization techniques to further disentangle the effects of viewpoint and shape. We obtain the first results for large-scale (more than 50 categories) single-viewpoint shape prediction using a single model without any 3D cues. We are also the first to examine and quantify the benefit of class information in single-view supervised 3D shape reconstruction. Our method achieves superior performance over state-of-the-art methods on ShapeNet-13, ShapeNet-55 and Pascal3D+.
翻译:我们提出了一个新型的 3D 形状重建方法, 用来从一个 RGB 图像中预测隐含的 3D 形状表示。 我们的方法使用一套没有视图注释的多对象类别的单一视图图像, 迫使模型在没有3D 监督的情况下跨多个对象类别学习。 为了在这种最低限度的监督之下促进学习, 我们使用分类标签来指导形状学习, 采用一种新的绝对绝对的计量学习方法。 我们还使用对立和观点正规化技术来进一步分解观点和形状的影响。 我们用一个没有任何 3D 提示的单一模型获得大规模( 超过50 个类别) 单一视图形状预测的第一批结果。 我们还是第一个在单一视图监督的 3D 形状重建中检查和量化类信息的好处。 我们的方法取得了优于ShapeNet- 13、 ShapeNet- 55 和 Pscal3D+ 上的最新方法的优异性表现。