3D shape is a crucial but heavily underutilized cue in today's computer vision systems, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape representation in the loop. Apart from category recognition, recovering full 3D shapes from view-based 2.5D depth maps is also a critical part of visual understanding. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically. It naturally supports joint object recognition and shape completion from 2.5D depth maps, and it enables active object recognition through view planning. To train our 3D deep learning model, we construct ModelNet -- a large-scale 3D CAD model dataset. Extensive experiments show that our 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.
翻译:3D 形状在今天的计算机视觉系统中是一个至关重要但严重利用不足的提示,这主要是由于缺乏良好的通用形状代表。由于最近提供了廉价的 2.5D 深度传感器(例如微软Kinect),因此在环形中有一个强大的 3D 形状代表器变得越来越重要。除了类别识别外,从基于视觉的 2.5D 深度地图中恢复完整的 3D 形状也是视觉理解的一个关键部分。为此,我们提议在 3D 深层信仰网络中代表一个几何 3D 形状作为3D 深层变量的概率分布。我们的模型,3D 形状网络,从原始 CAD 数据中学习复杂的 3D 形状的分布和任意配置,并自动发现等级构成部分的表示器。它自然支持从基于 2.5D 深度地图中联合确认和形状的完成,并且通过视觉规划使积极的物体识别成为积极的物体。为了培训我们的 3D 深层学习模型,我们建造一个3D 3D 3D 模型网 -- 大规模 3D CAD 模型数据集。广泛的实验显示,我们3D 深度的3D 深层代表可以显著的多样化任务在不同的任务中进行重大的改进。