Image animation transfers the motion of a driving video to a static object in a source image, while keeping the source identity unchanged. Great progress has been made in unsupervised motion transfer recently, where no labelled data or ground truth domain priors are needed. However, current unsupervised approaches still struggle when there are large motion or viewpoint discrepancies between the source and driving images. In this paper, we introduce three measures that we found to be effective for overcoming such large viewpoint changes. Firstly, to achieve more fine-grained motion deformation fields, we propose to apply Neural-ODEs for parametrizing the evolution dynamics of the motion transfer from source to driving. Secondly, to handle occlusions caused by large viewpoint and motion changes, we take advantage of the appearance flow obtained from the source image itself ("self-appearance"), which essentially "borrows" similar structures from other regions of an image to inpaint missing regions. Finally, our framework is also able to leverage the information from additional reference views which help to drive the source identity in spite of varying motion state. Extensive experiments demonstrate that our approach outperforms the state-of-the-arts by a significant margin (~40%), across six benchmarks varying from human faces, human bodies to robots and cartoon characters. Model generality analysis indicates that our approach generalises the best across different object categories as well.
翻译:图像动画将驱动视频的动作转移到源图像中的静态对象,同时保持源身份不变。 最近,在未经监督的移动转移中取得了巨大进展,没有贴标签的数据或地面真相域前缀。 然而,当源图像和驱动图像之间出现巨大的运动或观点差异时,目前未经监督的方法仍然在挣扎。 在本文中,我们引入了三项措施,我们认为这三项措施对于克服如此大的视野变化是有效的。首先,为了实现更精细的移动变形场,我们建议应用神经-代码来平衡从源向驱动的移动转移的进化动态。第二,为了处理大规模观点和运动变化造成的排斥,我们利用源图像本身(“自我外观”)的外观流,我们基本上“浏览”了从其它图像区域到缺失的区域的类似结构。最后,我们的框架还能够利用更多的参考信息,帮助驱动源身份,尽管运动状态不同。广泛的实验表明,我们采用的方法超越了由大视角和动动动动动动的六大比例。我们利用了从模型的六大比例,从人类的模型显示,跨整个机器人的六大比例。