Human motion transfer (HMT) aims to generate a video clip for the target subject by imitating the source subject's motion. Although previous methods have achieved good results in synthesizing good-quality videos, they lose sight of individualized motion information from the source and target motions, which is significant for the realism of the motion in the generated video. To address this problem, we propose a novel identity-preserved HMT network, termed \textit{IDPres}. This network is a skeleton-based approach that uniquely incorporates the target's individualized motion and skeleton information to augment identity representations. This integration significantly enhances the realism of movements in the generated videos. Our method focuses on the fine-grained disentanglement and synthesis of motion. To improve the representation learning capability in latent space and facilitate the training of \textit{IDPres}, we introduce three training schemes. These schemes enable \textit{IDPres} to concurrently disentangle different representations and accurately control them, ensuring the synthesis of ideal motions. To evaluate the proportion of individualized motion information in the generated video, we are the first to introduce a new quantitative metric called Identity Score (\textit{ID-Score}), motivated by the success of gait recognition methods in capturing identity information. Moreover, we collect an identity-motion paired dataset, $Dancer101$, consisting of solo-dance videos of 101 subjects from the public domain, providing a benchmark to prompt the development of HMT methods. Extensive experiments demonstrate that the proposed \textit{IDPres} method surpasses existing state-of-the-art techniques in terms of reconstruction accuracy, realistic motion, and identity preservation.
翻译:暂无翻译