We present Neural Marionette, an unsupervised approach that discovers the skeletal structure from a dynamic sequence and learns to generate diverse motions that are consistent with the observed motion dynamics. Given a video stream of point cloud observation of an articulated body under arbitrary motion, our approach discovers the unknown low-dimensional skeletal relationship that can effectively represent the movement. Then the discovered structure is utilized to encode the motion priors of dynamic sequences in a latent structure, which can be decoded to the relative joint rotations to represent the full skeletal motion. Our approach works without any prior knowledge of the underlying motion or skeletal structure, and we demonstrate that the discovered structure is even comparable to the hand-labeled ground truth skeleton in representing a 4D sequence of motion. The skeletal structure embeds the general semantics of possible motion space that can generate motions for diverse scenarios. We verify that the learned motion prior is generalizable to the multi-modal sequence generation, interpolation of two poses, and motion retargeting to a different skeletal structure.
翻译:我们提出Neural Marionette, 这是一种从动态序列中发现骨骼结构的不受监督的方法,它从动态序列中发现骨骼结构,并学会产生与观察到的运动动态动态相一致的各种运动。鉴于对一个在任意运动下表达的体体的点云观测的视频流,我们的方法发现了能够有效代表运动的未知的低维骨骼关系。然后,发现的结构被用来将动态序列的动作前奏编码在一个潜伏结构中,这种前奏可以与相对的联合旋转进行解码,以代表整个骨骼运动。我们的方法在不事先了解基本运动或骨骼结构的情况下发挥作用,我们证明所发现的结构甚至可以与手贴的地面真理骨架相比,代表着4D运动的顺序。骨骼结构嵌入了可能运动空间的一般语义,这些运动空间可以产生不同场景的动。我们核实, 所学的先前运动是普遍的,可以适用于多模式序列的生成、两种形态的相互调和运动重新定位到不同的骨骼结构。