被生命存在的来自无生命：无词训练下的开放式文本到动作生成 (Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training)

Text-to-motion generation is an emerging and challenging problem, which aims to synthesize motion with the same semantics as the input text. However, due to the lack of diverse labeled training data, most approaches either limit to specific types of text annotations or require online optimizations to cater to the texts during inference at the cost of efficiency and stability. In this paper, we investigate offline open-vocabulary text-to-motion generation in a zero-shot learning manner that neither requires paired training data nor extra online optimization to adapt for unseen texts. Inspired by the prompt learning in NLP, we pretrain a motion generator that learns to reconstruct the full motion from the masked motion. During inference, instead of changing the motion generator, our method reformulates the input text into a masked motion as the prompt for the motion generator to ``reconstruct'' the motion. In constructing the prompt, the unmasked poses of the prompt are synthesized by a text-to-pose generator. To supervise the optimization of the text-to-pose generator, we propose the first text-pose alignment model for measuring the alignment between texts and 3D poses. And to prevent the pose generator from overfitting to limited training texts, we further propose a novel wordless training mechanism that optimizes the text-to-pose generator without any training texts. The comprehensive experimental results show that our method obtains a significant improvement against the baseline methods. The code is available at https://github.com/junfanlin/oohmg.

翻译：文本到动作生成是一个新兴的、具有挑战性的问题，旨在以与输入文本相同的语义合成动作。然而，由于缺乏多样化的标记训练数据，大多数方法限制于特定类型的文本注释，或者需要在线优化才能在推断过程中适应文本，这样会影响效果和稳定性。在本文中，我们研究了一种零样本的离线开放式词汇文本到动作生成的方法，既不需要配对训练数据，也不需要额外的在线优化来适应未见过的文本。受NLP中的提示学习启发，我们预训练一个动作生成器，从掩盖的动作中学习重构完整的动作。在推断过程中，我们不改变动作生成器，而是将输入文本转换为掩盖的动作，作为动作生成器“重构”动作的提示。在构建提示时，提示的非掩盖姿势由文本到姿态生成器合成。为了监督文本到姿势生成器的优化，我们提出了第一个文本-姿势对齐模型，用于测量文本和3D姿势之间的对齐。为了防止姿势生成器过拟合于有限的训练文本，我们还提出了一种新颖的无词训练机制，用于在没有任何训练文本的情况下优化文本-姿势生成器。综合实验结果表明，我们的方法相对于基线方法获得了显着的改进。代码可在 https://github.com/junfanlin/oohmg 上找到。

相关内容

生成器

关注 2

生成器是一次生成一个值的特殊类型函数。可以将其视为可恢复函数。调用该函数将返回一个可用于生成连续 x 值的生成【Generator】，简单的说就是在函数的执行过程中，yield语句会把你需要的值返回给调用生成器的地方，然后退出函数，下一次调用生成器函数的时候又从上次中断的地方开始执行，而生成器内的所有变量参数都会被保存下来供下一次使用。

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日