We present Text2Gestures, a transformer-based learning method to interactively generate emotive full-body gestures for virtual agents aligned with natural language text inputs. Our method generates emotionally expressive gestures by utilizing the relevant biomechanical features for body expressions, also known as affective features. We also consider the intended task corresponding to the text and the target virtual agents' intended gender and handedness in our generation pipeline. We train and evaluate our network on the MPI Emotional Body Expressions Database and observe that our network produces state-of-the-art performance in generating gestures for virtual agents aligned with the text for narration or conversation. Our network can generate these gestures at interactive rates on a commodity GPU. We conduct a web-based user study and observe that around 91% of participants indicated our generated gestures to be at least plausible on a five-point Likert Scale. The emotions perceived by the participants from the gestures are also strongly positively correlated with the corresponding intended emotions, with a minimum Pearson coefficient of 0.77 in the valence dimension.
翻译:我们展示了Text2Gestures, 这是一种基于变压器的学习方法, 以互动方式生成与自然语言文本输入一致的虚拟代理器的情感全体动作。 我们的方法通过使用身体表达方式的相关生物机械特征, 产生情感表达姿态, 也称为感性特征。 我们还考虑了与文本相对应的预期任务和目标虚拟代理器的预期性别以及我们这一代人管道的牵线性能。 我们在MPI情感身体表达式数据库上培训和评估我们的网络, 并观察我们的网络在生成虚拟代理器的姿态时产生了最先进的性能, 与解说或对话文本一致。 我们的网络可以在商品 GPU 上以互动速度生成这些动作。 我们进行基于网络的用户研究, 并观察到大约91%的参与者表示我们所制作的动作至少在五点微弱比例上是可信的。 手势的参与者所感受到的情绪也与相应的预期情感有着强烈的正相关关系, 其价值层面的最小皮尔逊系数为0.77。