We present a method for retiming people in an ordinary, natural video -- manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up/slowing down, or entirely "freezing" people), or "erase" selected people from the video altogether. We achieve these effects computationally via a dedicated learning-based layered video representation, where each frame in the video is decomposed into separate RGBA layers, representing the appearance of different people in the video. A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate -- e.g., shadows, reflections, and motion of loose clothing. The layers can be individually retimed and recombined into a new video, allowing us to achieve realistic, high-quality renderings of retiming effects for real-world videos depicting complex actions and involving multiple individuals, including dancing, trampoline jumping, or group running.
翻译:我们用普通的、自然的视频展示一种对人进行重塑的方法 -- -- 操纵和编辑视频中个人不同动作发生的时间。我们可以在时间上调整不同的动作,改变某些动作的速度(速度上下降或完全“冻结”的人),或者完全从视频中“加速”所选的人。我们通过一个专门的基于学习的层层视频演示实现这些效果,其中视频中的每个框都分解成不同的 RGBA 层,代表视频中不同人物的出现。我们模型的一个关键属性是,它不仅将输入视频中的每个人的直接动作分解开,而且将每个人的动作与他们产生的场景变化自动联系起来 -- -- 例如影子、反射和松散的衣物运动。这些层可以单独重排时间,并重新组合成一个新的视频,使我们能够实现真实世界视频中描述复杂动作和涉及多个个人的重塑效果的现实、高品质的转换,包括舞蹈、跳轮跳或集体运行。