We study conversational dialog in which there are many possible responses to a given history. We present the MultiTalk Dataset, a corpus of over 320,000 sentences of written conversational dialog that balances a high branching factor (10) with several conversation turns (6) through selective branch continuation. We make multiple contributions to study dialog generation in the highly branching setting. In order to evaluate a diverse set of generations, we propose a simple scoring algorithm, based on bipartite graph matching, to optimally incorporate a set of diverse references. We study multiple language generation tasks at different levels of predictive conversation depth, using textual attributes induced automatically from pretrained classifiers. Our culminating task is a challenging theory of mind problem, a controllable generation task which requires reasoning about the expected reaction of the listener.
翻译:我们研究对特定历史有许多可能的响应的谈话对话。 我们展示了多主题数据集, 共有32万多句的书面对话, 平衡一个高分流系数 (10) 和若干个对话翻转 (6) 通过选择性的分流延续。 我们为研究高度分流环境中的对话生成做出多种贡献。 为了评估各代人, 我们建议基于双向图表匹配的简单评分算法, 以最佳的方式纳入一系列不同的参考。 我们在不同层次的预测对话深度上研究多种语言生成任务, 使用预先培训的分类师自动生成的文本属性 。 我们的最终任务是挑战性的思想问题理论, 一种可控制的代际任务, 需要解释听众的预期反应 。