用于新观点行动综述的生化反逆向网 (Pose-guided Generative Adversarial Net for Novel View Action Synthesis)

We focus on the problem of novel-view human action synthesis. Given an action video, the goal is to generate the same action from an unseen viewpoint. Naturally, novel view video synthesis is more challenging than image synthesis. It requires the synthesis of a sequence of realistic frames with temporal coherency. Besides, transferring the different actions to a novel target view requires awareness of action category and viewpoint change simultaneously. To address these challenges, we propose a novel framework named Pose-guided Action Separable Generative Adversarial Net (PAS-GAN), which utilizes pose to alleviate the difficulty of this task. First, we propose a recurrent pose-transformation module which transforms actions from the source view to the target view and generates novel view pose sequence in 2D coordinate space. Second, a well-transformed pose sequence enables us to separatethe action and background in the target view. We employ a novel local-global spatial transformation module to effectively generate sequential video features in the target view using these action and background features. Finally, the generated video features are used to synthesize human action with the help of a 3D decoder. Moreover, to focus on dynamic action in the video, we propose a novel multi-scale action-separable loss which further improves the video quality. We conduct extensive experiments on two large-scale multi-view human action datasets, NTU-RGBD and PKU-MMD, demonstrating the effectiveness of PAS-GAN which outperforms existing approaches.

翻译：我们集中关注人类行动合成的新颖观点问题。在行动视频中, 目标是从无形的角度产生同样的行动。自然, 新颖的视频合成比图像合成更具挑战性。它需要将一系列现实框架与时间一致性相结合。此外, 将不同行动转换为新目标视图需要认识行动类别和同时观点变化。为了应对这些挑战, 我们提议了一个名为Pose- 指导行动分化基因反转网( PAS-GAN)的新颖框架, 该框架利用面貌来减轻这项任务的难度。首先, 我们提议了一个经常性的组合变形模块, 将行动从源视图转换到目标视图, 并在 2D 协调空间生成新的视图序列。其次, 精心变化的组合组合使我们能将目标视图中的行动和背景区分开来。我们使用一个新的本地- 全球空间变形变换模块, 利用这些动作和背景特征在目标视图中有效地产生相继的视频特征。最后, 生成的视频特征被用来将人类行动与3D 解码器的帮助合成。此外, 我们侧重于动态的动态动作动作, 大规模的多级的动作, 我们提出了多级的动作, 级的图像, 我们提出了大规模的多级的动作, 级的动作的动作, 我们的动作, 我们的动作的动作的动作, 我们的动作, 的动作的动作的动作, 我们的动作, 的动作, 我们的跨级的跨级的动作, 级的动作, 级的动作, 级的动作, 我们的动作, 我们的动作, 我们的动作, 级的动作, 级的动作, 级的动作, 级, 级级的动作, 级级级级的级级的级的级的级的级的级的级的级的动作, 级的动作, 级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级, 级, 级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的级的

相关内容