Creating high-quality articulated 3D models of animals is challenging either via manual creation or using 3D scanning tools. Therefore, techniques to reconstruct articulated 3D objects from 2D images are crucial and highly useful. In this work, we propose a practical problem setting to estimate 3D pose and shape of animals given only a few (10-30) in-the-wild images of a particular animal species (say, horse). Contrary to existing works that rely on pre-defined template shapes, we do not assume any form of 2D or 3D ground-truth annotations, nor do we leverage any multi-view or temporal information. Moreover, each input image ensemble can contain animal instances with varying poses, backgrounds, illuminations, and textures. Our key insight is that 3D parts have much simpler shape compared to the overall animal and that they are robust w.r.t. animal pose articulations. Following these insights, we propose LASSIE, a novel optimization framework which discovers 3D parts in a self-supervised manner with minimal user intervention. A key driving force behind LASSIE is the enforcing of 2D-3D part consistency using self-supervisory deep features. Experiments on Pascal-Part and self-collected in-the-wild animal datasets demonstrate considerably better 3D reconstructions as well as both 2D and 3D part discovery compared to prior arts. Project page: chhankyao.github.io/lassie/
翻译:创建高品质的3D 动物的3D 表达型模型,无论是通过人工创建还是使用 3D 扫描工具,都具有挑战性。因此,从 2D 图像重建 3D 表达式对象的技术既重要又非常有用。在这项工作中,我们提出了一个实际问题设置来估计3D 动物的形状和形状,只给少数(10-30)动物(如马)一个物种(如10-30)的图像。与依靠预先定义的模板形状的现有工作相反,我们不采取任何形式的 2D 或 3D 地面图解,我们也不利用任何多视角或时间信息。此外,每个输入式图像共集可以包含不同形状、背景、照明和纹理的动物实例。我们的主要见解是,3D 部分与整个动物物种(如马、马、马、马)的形状相比,其形状简单得多。我们建议LASSIIE, 一个小的优化框架,它以最起码的用户干预方式发现 3D 部分。LASSIE背后的关键驱动力可以包含2D-3D 部分的动物的动物实例, 将2D 部分进行更精确的实验性重建, 3D,作为自我鉴变整。