The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures. We present a learning framework to recover the shape interiors (RoSI) of existing 3D models with only their exteriors from multi-view and multi-articulation images. Given a set of RGB images that capture a target 3D object in different articulated poses, possibly from only few views, our method infers the interior planes that are observable in the input images. Our neural architecture is trained in a category-agnostic manner and it consists of a motion-aware multi-view analysis phase including pose, depth, and motion estimations, followed by interior plane detection in images and 3D space, and finally multi-view plane fusion. In addition, our method also predicts part articulations and is able to realize and even extrapolate the captured motions on the target 3D object. We evaluate our method by quantitative and qualitative comparisons to baselines and alternative solutions, as well as testing on untrained object categories and real image inputs to assess its generalization capabilities.
翻译:绝大多数出现在游戏、虚拟现实/增强现实以及用于训练几何深度学习算法的3D模型都是不完整的,因为它们被建模为表面网格并且缺失内部结构。我们提出了一种学习框架,可以从多视图和多关节图像中恢复现有3D模型的形状内部结构(RoSI)。给定一组RGB图像,这些图像捕捉到一个目标3D对象在不同关节姿势下,可能只来自少数视图的外观。我们的方法推断出可以在输入图像中观察到的内部平面。我们的神经架构以不区分类别的方式进行训练,它包括一个运动感知多视图分析阶段,包括姿势、深度和运动估计,然后是图像和3D空间内的内部平面检测,最后是多视图平面融合。此外,我们的方法还预测部分关节,并能实现并推广到目标3D对象上捕捉到的运动。我们通过定量和定性比较基线和替代解决方案以及测试未经过训练的对象类别和真实图像输入来评估我们的方法,以评估其泛化能力。