Open domain conversational agents can answer a broad range of targeted queries. However, the sequential nature of interaction with these systems makes knowledge exploration a lengthy task which burdens the user with asking a chain of well phrased questions. In this paper, we present a retrieval based system and associated dataset for predicting the next questions that the user might have. Such a system can proactively assist users in knowledge exploration leading to a more engaging dialog. The retrieval system is trained on a dataset which contains ~14K multi-turn information-seeking conversations with a valid follow-up question and a set of invalid candidates. The invalid candidates are generated to simulate various syntactic and semantic confounders such as paraphrases, partial entity match, irrelevant entity, and ASR errors. We use confounder specific techniques to simulate these negative examples on the OR-QuAC dataset and develop a dataset called the Follow-up Query Bank (FQ-Bank). Then, we train ranking models on FQ-Bank and present results comparing supervised and unsupervised approaches. The results suggest that we can retrieve the valid follow-ups by ranking them in higher positions compared to confounders, but further knowledge grounding can improve ranking performance.
翻译:开放域对话代理器可以回答广泛的定向询问。 但是,与这些系统的相继性互动使知识探索成为一项漫长的任务,使用户在提出一系列措辞精良的问题时承担了沉重的负担。 在本文中,我们提出了一个基于检索的系统和相关数据集,用于预测用户可能拥有的下一个问题。这样的系统可以积极主动地协助用户进行知识探索,从而导致一个更具参与性的对话。检索系统在包含 ~14K 多回合信息搜索对话的数据集上接受培训,该数据集包含一个有效的后续问题和一组无效候选人。无效的候选人被生成来模拟各种合成和语义混杂者,如副词句、部分实体匹配、不相干的实体和ASR错误。我们使用混杂的具体技术模拟OR-QuAC 数据集中的这些负面例子,并开发一个称为后续Query Bank (FQ-Bank) 的数据集。 然后,我们培训FQ-Bank 的排名模型,并展示比较受监管和未受监督的方法的结果。 结果表明,我们可以通过将他们提升的成绩提升到更高的水平来检索有效的后续活动。