In this paper, a deep learning-based model for 3D human motion generation from the text is proposed via gesture action classification and an autoregressive model. The model focuses on generating special gestures that express human thinking, such as waving and nodding. To achieve the goal, the proposed method predicts expression from the sentences using a text classification model based on a pretrained language model and generates gestures using the gate recurrent unit-based autoregressive model. Especially, we proposed the loss for the embedding space for restoring raw motions and generating intermediate motions well. Moreover, the novel data augmentation method and stop token are proposed to generate variable length motions. To evaluate the text classification model and 3D human motion generation model, a gesture action classification dataset and action-based gesture dataset are collected. With several experiments, the proposed method successfully generates perceptually natural and realistic 3D human motion from the text. Moreover, we verified the effectiveness of the proposed method using a public-available action recognition dataset to evaluate cross-dataset generalization performance.
翻译:在本文中,通过手势行动分类和自动递减模式,为3D人类运动生成文本提出了一个深层次的学习模型。该模型侧重于生成表达人类思维的特殊手势,如挥手和点头。为了实现这一目标,拟议方法预测了使用基于预先培训的语言模式的文本分类模型的句子表达方式,并使用以门为单位的连续单位自动递减模式生成手势。特别是,我们提议了为恢复原始动作和产生中间动作而嵌入空间的损失。此外,还提出了新的数据增强方法和停止符号,以产生变长动作。为了评估文本分类模型和3D人类运动生成模型,收集了一种手势行动分类数据集和基于行动的手势数据集。经过几次实验,拟议方法成功地产生了文本中概念上自然而现实的3D人类运动。此外,我们用公众可用的行动识别数据集来评估交叉数据化性能,核实了拟议方法的有效性。