Human motion transfer (HMT) aims to generate a video clip for the target subject by imitating the source subject's motion. Although previous methods have achieved remarkable results in synthesizing good-quality videos, those methods omit the effects of individualized motion information from the source and target motions, \textit{e.g.}, fine and high-frequency motion details, on the realism of the motion in the generated video. To address this problem, we propose an identity-preserved HMT network (\textit{IDPres}), which follows the pipeline of the skeleton-based method. \textit{IDpres} takes the individualized motion and skeleton information to enhance motion representations and improve the reality of motions in the generated videos. With individualized motion, our method focuses on fine-grained disentanglement and synthesis of motion. In order to improve the representation capability in latent space and facilitate the training of \textit{IDPres}, we design a training scheme, which allows \textit{IDPres} to disentangle different representations simultaneously and control them to synthesize ideal motions accurately. Furthermore, to our best knowledge, there are no available metrics for evaluating the proportion of identity information (both individualized motion and skeleton information) in the generated video. Therefore, we propose a novel quantitative metric called Identity Score (\textit{IDScore}) based on gait recognition. We also collected a dataset with 101 subjects' solo-dance videos from the public domain, named $Dancer101$, to evaluate the method. The comprehensive experiments show the proposed method outperforms state-of-the-art methods in terms of reconstruction accuracy and realistic motion.
翻译:人体动作转移(HMT)旨在通过模仿源主题的动作为目标主题生成视频剪辑。尽管以前的方法在合成高质量视频方面取得了显着的成果,但这些方法省略了源和目标动作中个性化运动信息的影响,例如精细和高频运动细节对生成视频中动作的逼真度的影响。为了解决这个问题,我们提出了一种身份保留的HMT网络(IDPres),该网络遵循基于骨架的方法的流程。IDPres使用个性化动作和骨架信息来增强动作表示,并改进在生成的视频中的实际运动。借助个性化动作,我们的方法专注于动作的精细分离和合成。为了提高潜在空间中的表示能力并促进IDPres的训练,我们设计了一个训练方案,允许IDPres同时分离不同的表示,并控制它们准确地合成理想的动作。此外,据我们所知,没有可用的度量标准来评估生成的视频中身份信息(包括个性化动作和骨架信息)的比例。因此,我们提出了一种基于步态识别的新型定量度量标准称为身份分数(IDScore)。我们还从公共领域收集了一个由101位独舞视频的数据集,命名为Dancer101,用于评估该方法。综合实验表明,所提出的方法在重建精度和逼真的运动方面优于现有技术。