Recovering the skeletal shape of an animal from a monocular video is a longstanding challenge. Prevailing animal reconstruction methods often adopt a control-point driven animation model and optimize bone transforms individually without considering skeletal topology, yielding unsatisfactory shape and articulation. In contrast, humans can easily infer the articulation structure of an unknown animal by associating it with a seen articulated character in their memory. Inspired by this fact, we present CASA, a novel Category-Agnostic Skeletal Animal reconstruction method consisting of two major components: a video-to-shape retrieval process and a neural inverse graphics framework. During inference, CASA first retrieves an articulated shape from a 3D character assets bank so that the input video scores highly with the rendered image, according to a pretrained language-vision model. CASA then integrates the retrieved character into an inverse graphics framework and jointly infers the shape deformation, skeleton structure, and skinning weights through optimization. Experiments validate the efficacy of CASA regarding shape reconstruction and articulation. We further demonstrate that the resulting skeletal-animated characters can be used for re-animation.
翻译:从单视视像中回收动物骨骼形状是一项长期挑战。常用的动物重建方法往往采用控制点驱动动画模型,在不考虑骨骼结构学的情况下,将骨质变形优化为个体变化,产生不令人满意的形状和表达方式。相反,人类可以通过将未知动物与记忆中可见的直径字符联系起来,很容易地推断出该动物的连接结构。受这一事实的启发,我们介绍了新型的CASA,即由两个主要组成部分组成的新型类别-不可知的骨骼动物重建方法:视频到成形检索过程和神经反向图形框架。在推断过程中,CASA首先从3D字符库中提取一个清晰的形状,以便输入视频根据预先培训的语言模型,与成型图像高度得分。CASA随后将重新获得的字符纳入反向图形框架,并通过优化共同推断形状变形、骨架结构和皮肤重量。实验验证了CASA在形状重建与表达面上的功效。我们进一步证明,由此形成的骨骼成型字符可以用于再造。