While rich, open-domain textual data are generally available and may include interesting phenomena (humor, sarcasm, empathy, etc.) most are designed for language processing tasks, and are usually in a non-conversational format. In this work, we take a step towards automatically generating conversational data using Generative Conversational Networks, aiming to benefit from the breadth of available language and knowledge data, and train open domain social conversational agents. We evaluate our approach on conversations with and without knowledge on the Topical Chat dataset using automatic metrics and human evaluators. Our results show that for conversations without knowledge grounding, GCN can generalize from the seed data, producing novel conversations that are less relevant but more engaging and for knowledge-grounded conversations, it can produce more knowledge-focused, fluent, and engaging conversations. Specifically, we show that for open-domain conversations with 10\% of seed data, our approach performs close to the baseline that uses 100% of the data, while for knowledge-grounded conversations, it achieves the same using only 1% of the data, on human ratings of engagingness, fluency, and relevance.
翻译:内容丰富、开放的文本数据一般都有,其中可能包括令人感兴趣的现象(humor、sacasm、discription等),大多数都设计用于语言处理任务,通常采用非对话的形式。在这项工作中,我们迈出了一步,利用创造式对话网络自动生成对话数据,目的是从现有语言和知识数据的广度中获益,并培训开放式社会对话代理商。我们用自动计量和人类评价员评估了我们关于使用和不掌握专题聊天数据集的谈话方法。我们的结果显示,对于没有知识的地面对话,GCN可以从种子数据中概括化,产生不太相关但更具参与性的新型对话,对于知识型对话来说,它能够产生更注重知识、更流畅和互动的对话。具体地说,我们显示,对于与种子数据10 ⁇ 的开放式对话,我们的方法接近使用100%的数据的基线,而对于知识型对话,它只能使用1%的数据、关于接触性、流利度和相关性的数据。