Body language such as conversational gesture is a powerful way to ease communication. Conversational gestures do not only make a speech more lively but also contain semantic meaning that helps to stress important information in the discussion. In the field of robotics, giving conversational agents (humanoid robots or virtual avatars) the ability to properly use gestures is critical, yet remain a task of extraordinary difficulty. This is because given only a text as input, there are many possibilities and ambiguities to generate an appropriate gesture. Different to previous works we propose a new method that explicitly takes into account the gesture types to reduce these ambiguities and generate human-like conversational gestures. Key to our proposed system is a new gesture database built on the TED dataset that allows us to map a word to one of three types of gestures: "Imagistic" gestures, which express the content of the speech, "Beat" gestures, which emphasize words, and "No gestures." We propose a system that first maps the words in the input text to their corresponding gesture type, generate type-specific gestures and combine the generated gestures into one final smooth gesture. In our comparative experiments, the effectiveness of the proposed method was confirmed in user studies for both avatar and humanoid robot.
翻译:谈话手势等身体语言是方便交流的有力方法。 交谈手势不仅使演讲更加活跃,而且包含有助于强调讨论重要信息的语义含义。 在机器人领域, 给予对话代理人( 人形机器人或虚拟动因)适当使用手势的能力至关重要, 但仍是一项非常困难的任务。 这是因为仅仅给一个文本作为输入, 就有许多可能性和模糊性来产生一个适当的手势。 不同于以往的工作, 我们提议一种新的方法, 明确考虑到减少这些模棱两可的手势类型, 并产生像人类一样的对话手势。 我们提议的系统的关键是一个新的手势数据库, 建在TED数据集上, 使我们能够将一个单词绘制成三种手势的其中一种: “ 想象” 手势, 表达演讲内容的“ 贝特” 手势, 强调文字, 和“ 没有手势 手势 ” 。 我们提议了一个系统, 首先将输入文本中的单词绘制到相应的手势类型, 生成特定手势, 并将产生的手势合并成一个最后的手势, 在比较实验中, 用户手势法的手势的效果得到了确认。