We introduce MUGL, a novel deep neural model for large-scale, diverse generation of single and multi-person pose-based action sequences with locomotion. Our controllable approach enables variable-length generations customizable by action category, across more than 100 categories. To enable intra/inter-category diversity, we model the latent generative space using a Conditional Gaussian Mixture Variational Autoencoder. To enable realistic generation of actions involving locomotion, we decouple local pose and global trajectory components of the action sequence. We incorporate duration-aware feature representations to enable variable-length sequence generation. We use a hybrid pose sequence representation with 3D pose sequences sourced from videos and 3D Kinect-based sequences of NTU-RGBD-120. To enable principled comparison of generation quality, we employ suitably modified strong baselines during evaluation. Although smaller and simpler compared to baselines, MUGL provides better quality generations, paving the way for practical and controllable large-scale human action generation.
翻译:我们引入了MUGL, 这是一种新颖的深神经模型, 用于大规模、 多样化的、 不同的、 不同的、 以人为主的、 以移动为主的动作序列。 我们的可控制方法使不同世代能够通过100多个类别的行动类别定制。 为了能够实现内部/ 类别间的多样性, 我们用一个条件性高斯混合变异自动编码模型来模拟潜在的基因空间。 为了能够现实地产生行动序列中涉及移动、 我们分解本地成形和全球轨迹组成部分的行动。 我们引入了长效特征显示, 以便能够生成可变长的序列。 我们使用一种由视频和 NTU- RGBBD-120 3D基基序列提供的3D组合序列。 为了能够有原则地比较生成质量, 我们在评估过程中采用了适当修改的强基线。 虽然比基线更小、更简单, MUGL 提供了更高质量的代, 为实际和可控制的大规模人类行动一代铺平了道路。