从运动中重建表达的类别 (Learning monocular 3D reconstruction of articulated categories from motion)

Monocular 3D reconstruction of articulated object categories is challenging due to the lack of training data and the inherent ill-posedness of the problem. In this work we use video self-supervision, forcing the consistency of consecutive 3D reconstructions by a motion-based cycle loss. This largely improves both optimization-based and learning-based 3D mesh reconstruction. We further introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles. We formulate this operation as a structured layer relying on mesh-laplacian regularization and show that it can be trained in an end-to-end manner. We finally introduce a per-sample numerical optimisation approach that jointly optimises over mesh displacements and cameras within a video, boosting accuracy both for training and also as test time post-processing. While relying exclusively on a small set of videos collected per category for supervision, we obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.

翻译：由于缺乏培训数据以及问题固有的不正确性,对清晰对象类别的单体 3D 重建具有挑战性。在这项工作中,我们使用视频自我监督,通过基于运动的周期损失迫使连续的 3D 重建保持一致。这在很大程度上改进了基于优化和基于学习的 3D 网格重建。我们进一步引入了3D 模板变形的可解释模型模型模型,该模型通过迁移少量可学习的本地控控控器来控制3D 表面。我们将这一操作设计成一个结构化的层,依靠网状拉平板的正规化,并展示它能够以端到端的方式接受培训。我们最终引入了每个抽样数字优化方法,在视频中共同对网状置换和照相机进行优化,提高培训的准确性,并作为测试后处理。我们完全依靠每类收集的少量视频来进行监管。我们获得了不同形状、观点和文字的状态重建,用于多个表达的物体类别。

相关内容

三维重建

关注 1173

在计算机视觉中, 三维重建是指根据单视图或者多视图的图像重建三维信息的过程. 由于单视频的信息不完全,因此三维重建需要利用经验知识. 而多视图的三维重建(类似人的双目定位)相对比较容易, 其方法是先对摄像机进行标定, 即计算出摄像机的图象坐标系与世界坐标系的关系.然后利用多个二维图象中的信息重建出三维信息。物体三维重建是计算机辅助几何设计(CAGD)、计算机图形学(CG)、计算机动画、计算机视觉、医学图像处理、科学计算和虚拟现实、数字媒体创作等领域的共性科学问题和核心技术。在计算机内生成物体三维表示主要有两类方法。一类是使用几何建模软件通过人机交互生成人为控制下的物体三维几何模型,另一类是通过一定的手段获取真实物体的几何形状。前者实现技术已经十分成熟,现有若干软件支持,比如:3DMAX、Maya、AutoCAD、UG等等,它们一般使用具有数学表达式的曲线曲面表示几何形状。后者一般称为三维重建过程,三维重建是指利用二维投影恢复物体三维信息(形状等)的数学过程和计算机技术,包括数据获取、预处理、点云拼接和特征分析等步骤。