Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual research efforts in the field are difficult to compare: there are no established benchmarks, and each study tends to use its own dataset, motion visualisation, and evaluation methodology. To address this situation, we launched the GENEA Challenge, a gesture-generation challenge wherein participating teams built automatic gesture-generation systems on a common dataset, and the resulting systems were evaluated in parallel in a large, crowdsourced user study using the same motion-rendering pipeline. Since differences in evaluation outcomes between systems now are solely attributable to differences between the motion-generation methods, this enables benchmarking recent approaches against one another in order to get a better impression of the state of the art in the field. This paper reports on the purpose, design, results, and implications of our challenge.
翻译:共同说话的手势、演讲的同时的手势、在人类交流中扮演重要角色。因此,自动共同说话的手势生成是一个关键技术,使具有内涵的对话剂(ECAs)成为关键的辅助技术,因为人类期望ECA能够实现多式通信。对手势生成的研究正在迅速转向数据驱动的方法。不幸的是,实地的个体研究工作很难比较:没有既定的基准,每项研究都倾向于使用自己的数据集、运动视觉化和评价方法。为了应对这种情况,我们发起了GENEA挑战,即一个手势生成挑战,即参与团队在共同数据集上建立自动手势生成系统,由此产生的系统在使用同一运动驱动管道的大型、众源用户研究中同时进行评估。由于目前各系统之间在评价结果上的差异完全归因于运动生成方法之间的差异,因此能够将最新方法作为基准,以便更好地了解实地的艺术状况。这份关于我们挑战的目的、设计、结果和影响的报告。