Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images. However, it is still challenging to reconstruct nonrigid structures from RGB inputs, due to its under-constrained nature. While template-based approaches, such as parametric shape models, have achieved great success in modeling the "closed world" of known object categories, they cannot well handle the "open-world" of novel object categories or outlier shapes. In this work, we introduce a template-free approach to learn 3D shapes from a single video. It adopts an analysis-by-synthesis strategy that forward-renders object silhouette, optical flow, and pixel values to compare with video observations, which generates gradients to adjust the camera, shape and motion parameters. Without using a category-specific shape template, our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes. Code will be available at lasr-google.github.io .
翻译:3D 重建视频或图像集的僵硬结构取得了显著进展。 但是,由于RGB输入的不硬结构不受限制,重建这种结构仍是一项艰巨的任务。 以模板为基础的方法,如参数形状模型,在模拟已知物体类别的“封闭世界”方面取得了巨大成功,但它们无法很好地处理新物体类别的“开放世界”或外部形状。 在这项工作中,我们引入了一种无模板的方法,从一个视频中学习3D形状。它采用了一种分析的逐项合成战略,先发制人将Silhouette、光学流和像素值与视频观测进行对比,产生梯度以调整相机、形状和运动参数。 不使用特定形状模板, 我们的方法将人类、动物和不明类别对象的视频忠实地重建非硬化的3D结构。 代码将在 lasr-gougle. github.io 上提供。