We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. This is inspired by the observation that view-based surface representations are more effective at modeling high-resolution surface details and texture than their 3D counterparts based on point clouds or voxel occupancy. Specifically, given a 3D shape, we render it from multiple views, and set up a dense correspondence learning task within the contrastive learning framework. As a result, the learned 2D representations are view-invariant and geometrically consistent, leading to better generalization when trained on a limited number of labeled shapes compared to alternatives that utilize self-supervision in 2D or 3D alone. Experiments on textured (RenderPeople) and untextured (PartNet) 3D datasets show that our method outperforms state-of-the-art alternatives in fine-grained part segmentation. The improvements over baselines are greater when only a sparse set of views is available for training or when shapes are textured, indicating that MvDeCor benefits from both 2D processing and 3D geometric reasoning.
翻译:我们提议在 2D 域内使用自我监督的技术进行细微的 3D 形状分割任务。 启发我们的观点是, 以视图为基础的表面表层显示在高分辨率表面细节和纹理的模型模型中比基于点云或 voxel 占用的 3D 对应的3D 模型中更有效。 具体地说, 3D 形状, 我们从多个角度将其转换成, 在对比式学习框架内设置密集的函授学习任务 。 因此, 所学的 2D 表达式是视觉不易变和几何相一致的, 与仅使用 2D 或 3D 的自我监督外观的替代方法相比, 导致在有限的标签形状上培训时更普遍化。 关于 Textured (RenderPeoples) 和 Part Net (Part Net) 3D 数据集的实验显示, 我们的方法在细化的分解部分中, 超越了最先进的替代方法。 在只有少量的视图可供培训或形状显示时, 显示 MvDeor 和 MevCor 的好处更大。