Given a single image of a general object such as a chair, could we also restore its articulated 3D shape similar to human modeling, so as to animate its plausible articulations and diverse motions? This is an interesting new question that may have numerous downstream augmented reality and virtual reality applications. Comparing with previous efforts on object manipulation, our work goes beyond 2D manipulation and rigid deformation, and involves articulated manipulation. To achieve this goal, we propose an automated approach to build such 3D generic objects from single images and embed articulated skeletons in them. Specifically, our framework starts by reconstructing the 3D object from an input image. Afterwards, to extract skeletons for generic 3D objects, we develop a novel skeleton prediction method with a multi-head structure for skeleton probability field estimation by utilizing the deep implicit functions. A dataset of generic 3D objects with ground-truth annotated skeletons is collected. Empirically our approach is demonstrated with satisfactory performance on public datasets as well as our in-house dataset; our results surpass those of the state-of-the-arts by a noticeable margin on both 3D reconstruction and skeleton prediction.
翻译:根据一张像椅子这样的普通物体的单一图像,我们能否也恢复其与人造模型相似的3D立体形状,以动动其貌似真实的表达和各种动作?这是一个有趣的新问题,可能有许多下游扩大的现实和虚拟现实应用。与以往的物体操纵工作相比,我们的工作超越了2D操纵和僵硬变形,并涉及明确的操纵。为了实现这一目标,我们提议了一种自动方法,从单一图像中建立这种3D通用物体,并将这些立体的骨架嵌入其中。具体地说,我们的框架是从从一个输入图像中重建3D对象开始。随后,为了提取通用的3D对象的骨架,我们开发了一个新的骨架预测方法,利用深度的隐含功能进行骨骼概率实地估计的多头结构。收集了一组带有地面图解的3D通用物体的数据集。我们的方法在公共数据集和内部数据集中表现得令人满意;我们的结果在3D重建和骨架预测上都明显超过最新物体的骨架。