Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision. Despite the advances from recent image-to-image translation approaches, there are several transferring contexts where most end-to-end learning-based retargeting methods still perform poorly. Transferring human appearance from one actor to another is only ensured when a strict setup has been complied, which is generally built considering their training regime's specificities. The contribution of this paper is two-fold: first, we propose a novel and high-performant approach based on a hybrid image-based rendering technique that exhibits competitive visual retargeting quality compared to state-of-the-art neural rendering approaches. The formulation leverages user body shape into the retargeting while considering physical constraints of the motion in 3D and the 2D image domain. We also present a new video retargeting benchmark dataset composed of different videos with annotated human motions to evaluate the task of synthesizing people's videos, which can be used as a common base to improve tracking the progress in the field. The dataset and its evaluation protocols are designed to evaluate retargeting methods in more general and challenging conditions. Our method is validated in several experiments, comprising publicly available videos of actors with different shapes, motion types and camera setups. The dataset and retargeting code are publicly available to the community at: https://www.verlab.dcc.ufmg.br/retargeting-motion.
翻译:人类行为者的视频之间转移人类运动和外观仍然是计算机视野的主要挑战之一。尽管最近图像到图像翻译方法取得了进步,但有些传输环境中,大多数端到端基于学习的重新定位方法仍然效果不佳。只有当严格遵守了严格的设置,人们的外观才会从一个行为体转移到另一个行为体,而这种设置通常是考虑到其培训制度的特殊性而构建的。本文的贡献有两个方面:首先,我们提议一种新颖的、高性能的方法,其基础是基于图像的混合制作技术,显示与最先进的网络神经转换方法相比,具有竞争性的视觉再定位质量。配方的用户身体形状在重新定位上仍然效果不佳。配方在考虑3D和2D图像域运动的实际限制的同时,才能确保将人类的外观从一个行为体转移到另一个行为体。我们还推出一个新的视频重新定位基准数据集,由不同的视频组成,并附有附加附加说明的人类动议,用以评价人们视频合成的任务,可以用作共同基础,改进对实地进展情况的跟踪。数据集及其评价程序的设计是用来评估可公开定位的图像结构,在一般类型和具有挑战性的情况下,对各种工具进行重新定位的模型进行校准。