Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision. Despite the advances from recent image-to-image translation approaches, there are several transferring contexts where most end-to-end learning-based retargeting methods still perform poorly. Transferring human appearance from one actor to another is only ensured when a strict setup has been complied, which is generally built considering their training regime's specificities. In this work, we propose a shape-aware approach based on a hybrid image-based rendering technique that exhibits competitive visual retargeting quality compared to state-of-the-art neural rendering approaches. The formulation leverages the user body shape into the retargeting while considering physical constraints of the motion in 3D and the 2D image domain. We also present a new video retargeting benchmark dataset composed of different videos with annotated human motions to evaluate the task of synthesizing people's videos, which can be used as a common base to improve tracking the progress in the field. The dataset and its evaluation protocols are designed to evaluate retargeting methods in more general and challenging conditions. Our method is validated in several experiments, comprising publicly available videos of actors with different shapes, motion types, and camera setups. The dataset and retargeting code are publicly available to the community at: https://www.verlab.dcc.ufmg.br/retargeting-motion.
翻译:人类行为者的视频之间的人类运动和外观的转移仍然是计算机视野的主要挑战之一。尽管最近图像到图像的图像翻译方法取得了进步,但有些传输环境中,大多数端到端的基于学习的重新定位方法仍然效果不佳。只有严格遵守了严格的设置,人们的外观才能从一个行为体转移到另一个行为体,而这种设置通常是考虑到其培训制度的特殊性而建立的。在这项工作中,我们提议一种以基于图像的混合制成感知方法为基础的形状方法,该方法显示与最先进的神经神经化方法相比具有竞争性的视觉再定位质量。这种配制利用用户身体的形状重新定位,同时考虑3D和2D图像域运动的物理限制。我们还提出一个新的视频重新定位基准数据集,由不同的视频组成,配有附加附加说明的人类动议,以评价将人们的视频合成的任务。这个方法可以用作共同的基础,改进实地进展的跟踪。数据集及其评价程序旨在评估更一般和更具挑战性的条件的重新定位方法。该设计使用户身体的形状在3D和2D图像域域域中重新定位。我们使用的方法在公开的图像类型中进行了验证,将一些可用的数据组合和图表用于公开的模型。