Learning deformable 3D objects from 2D images is an extremely ill-posed problem. Existing methods rely on explicit supervision to establish multi-view correspondences, such as template shape models and keypoint annotations, which restricts their applicability on objects "in the wild". In this paper, we propose to use monocular videos, which naturally provide correspondences across time, allowing us to learn 3D shapes of deformable object categories without explicit keypoints or template shapes. Specifically, we present DOVE, which learns to predict 3D canonical shape, deformation, viewpoint and texture from a single 2D image of a bird, given a bird video collection as well as automatically obtained silhouettes and optical flows as training data. Our method reconstructs temporally consistent 3D shape and deformation, which allows us to animate and re-render the bird from arbitrary viewpoints from a single image.
翻译:从 2D 图像中学习可变化的 3D 对象是一个极其错误的问题。 现有的方法依靠明确的监督来建立多视图对应, 如模板形状模型和关键点说明, 这限制了它们对“ 野生” 对象的适用性 。 在本文中, 我们提议使用单向视频, 它自然地提供跨时间的对应性, 使我们能够在没有明确关键点或模板形状的情况下学习变形对象类别的 3D 形状 。 具体地说, 我们提出DOVE, 它从鸟类的单一 2D 图像中学习预测 3D 变形、 变形、 视图和纹理, 以鸟的单个 2D 图像为对象, 提供鸟的视频收藏以及自动获得的环形和光学流作为培训数据。 我们的方法重建了时间一致的 3D 形状和变形, 使我们能够从单个图像的任意角度对鸟类进行动和再造形。