We present a new method for generating controllable, dynamically responsive, and photorealistic human animations. Given an image of a person, our system allows the user to generate Physically plausible Upper Body Animation (PUBA) using interaction in the image space, such as dragging their hand to various locations. We formulate a reinforcement learning problem to train a dynamic model that predicts the person's next 2D state (i.e., keypoints on the image) conditioned on a 3D action (i.e., joint torque), and a policy that outputs optimal actions to control the person to achieve desired goals. The dynamic model leverages the expressiveness of 3D simulation and the visual realism of 2D videos. PUBA generates 2D keypoint sequences that achieve task goals while being responsive to forceful perturbation. The sequences of keypoints are then translated by a pose-to-image generator to produce the final photorealistic video.
翻译:我们展示了一种新的方法来生成可控、动态反应和摄影现实的人类动画。 根据一个人的图像,我们的系统允许用户使用图像空间中的交互作用,例如将手拖到不同的位置,生成在物理上看似可信的上体动画(PUBA ) 。 我们设计了一个强化学习问题来培训一个动态模型,以预测一个人的下一个2D状态(即图像的键点)为条件的动态模型,该模型以3D动作(即联合陶器)为条件,以及一项输出出最佳行动来控制一个人实现预期目标的政策。 动态模型利用了3D模拟的清晰度和2D视频的视觉真实性。 PUBA 生成了 2D 关键点序列,既能实现任务目标,又能对强行扰动作出反应。 关键点的序列随后由组合到图像的生成器翻译, 以制作最后的光真视频。