Knowledge-grounded dialogue systems are intended to convey information that is based on evidence provided in a given source text. We discuss the challenges of training a generative neural dialogue model for such systems that is controlled to stay faithful to the evidence. Existing datasets contain a mix of conversational responses that are faithful to selected evidence as well as more subjective or chit-chat style responses. We propose different evaluation measures to disentangle these different styles of responses by quantifying the informativeness and objectivity. At training time, additional inputs based on these evaluation measures are given to the dialogue model. At generation time, these additional inputs act as stylistic controls that encourage the model to generate responses that are faithful to the provided evidence. We also investigate the usage of additional controls at decoding time using resampling techniques. In addition to automatic metrics, we perform a human evaluation study where raters judge the output of these controlled generation models to be generally more objective and faithful to the evidence compared to baseline dialogue systems.
翻译:基于知识的对话系统旨在传递基于特定源文本中提供的证据的信息。我们讨论了为这种受控制的系统培训基因神经对话模式的挑战。现有的数据集包含对选定证据忠实的谈话反应以及更主观或聊天式反应的组合。我们提出了不同的评价措施,通过量化信息性和客观性来解析这些不同反应方式。在培训时间,根据这些评价措施向对话模式提供了更多的投入。在一代人的时间,这些额外投入起到文体控制的作用,鼓励模型生成符合所提供证据的反应。我们还调查在使用重标技术解码时使用额外控制手段的情况。除了自动计量外,我们还进行一项人类评价研究,使评级人判断这些受控生成模型的产出与基线对话系统相比,总体上更加客观和忠实于证据。