This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022. The GENEA challenge provides the processed datasets and performs crowdsourced evaluations to compare the performance of different gesture generation systems. In this paper, we explore an automatic gesture generation system based on multimodal representation learning. We use WavLM features for audio, FastText features for text and position and rotation matrix features for gesture. Each modality is projected to two distinct subspaces: modality-invariant and modality-specific. To learn inter-modality-invariant commonalities and capture the characters of modality-specific representations, gradient reversal layer based adversarial classifier and modality reconstruction decoders are used during training. The gesture decoder generates proper gestures using all representations and features related to the rhythm in the audio. Our code, pre-trained models and demo are available at https://github.com/YoungSeng/ReprGesture.
翻译:本文介绍“ReprGesture ” 条目,用于生成和评估编织剂的非语言行为(GENEA)的2022年挑战。GENEA的挑战提供经处理的数据集,并进行多方源评估,以比较不同手动生成系统的性能。在本文件中,我们探索基于多式代表学习的自动手动生成系统。我们用WavLM 功能用于音频功能、快速文本功能和定位以及手势的轮换矩阵特征。每种模式都预测在两个不同的子空间:模式-变量和特定模式。学习模式-变量的共性,并捕捉模式-表达的特性,在培训中使用梯度-逆转层的对抗式分类器和模式重建解码器。手势解码使用与音频有关的所有表达和特征产生适当的手势。我们的代码、预先培训过的模型和演示可在 https://github.com/YoungSeng/ReprGesture查阅。