Spoken dialogue systems that assist users to solve complex tasks such as movie ticket booking have become an emerging research topic in artificial intelligence and natural language processing areas. With a well-designed dialogue system as an intelligent personal assistant, people can accomplish certain tasks more easily via natural language interactions. Today there are several virtual intelligent assistants in the market; however, most systems only focus on textual or vocal interaction. In this paper, we present HUMBO, a system aiming at generating dialogue responses and simultaneously synthesize corresponding visual expressions on faces for better multimodal interaction. HUMBO can (1) let users determine the appearances of virtual assistants by a single image, and (2) generate coherent emotional utterances and facial expressions on the user-provided image. This is not only a brand new research direction but more importantly, an ultimate step toward more human-like virtual assistants.
翻译:帮助用户解决电影票预订等复杂任务的口语对话系统已成为人工智能和自然语言处理领域的新兴研究课题。有了设计良好的个人助理对话系统,人们可以更容易地通过自然语言互动完成某些任务。如今市场上有数名虚拟智能助理;然而,大多数系统只侧重于文字或声波互动。在本文中,我们介绍了HUMBO,这是一个旨在生成对话反应并同时合成面部相应视觉表达的系统,以更好地进行多式互动。 HUMBO可以(1)让用户用单一图像来决定虚拟助理的外观,(2)在用户提供的图像上产生一致的情感表达和面部表达。这不仅是一个全新的研究方向,更重要的是,这是向更像人类的虚拟助手迈出的终极一步。