Task-oriented dialog (TOD) systems often need to formulate knowledge base (KB) queries corresponding to the user intent and use the query results to generate system responses. Existing approaches require dialog datasets to explicitly annotate these KB queries -- these annotations can be time consuming, and expensive. In response, we define the novel problems of predicting the KB query and training the dialog agent, without explicit KB query annotation. For query prediction, we propose a reinforcement learning (RL) baseline, which rewards the generation of those queries whose KB results cover the entities mentioned in subsequent dialog. Further analysis reveals that correlation among query attributes in KB can significantly confuse memory augmented policy optimization (MAPO), an existing state of the art RL agent. To address this, we improve the MAPO baseline with simple but important modifications suited to our task. To train the full TOD system for our setting, we propose a pipelined approach: it independently predicts when to make a KB query (query position predictor), then predicts a KB query at the predicted position (query predictor), and uses the results of predicted query in subsequent dialog (next response predictor). Overall, our work proposes first solutions to our novel problem, and our analysis highlights the research challenges in training TOD systems without query annotation.
翻译:以任务为导向的对话(TOD)系统往往需要根据用户的意图制定知识基础查询,并使用查询结果来生成系统回应。现有方法要求对话数据集明确说明这些 KB 查询 -- -- 这些说明可能耗时且昂贵。作为回应,我们界定了预测 KB 查询和培训对话框的新问题,但没有明确的 KB 查询注释。关于查询预测,我们提议了一个强化学习基线(RL),奖励产生KB 调查结果涵盖随后对话中提到的实体的询问。进一步的分析显示, KB 查询属性之间的关联性会大大混淆记忆增强政策优化(MAPO),这是艺术RL 代理的当前状态。为了解决这个问题,我们用与我们的任务相适应的简单但重要的修改来改进MAPO 基线。为了为我们的设置培训完整的,我们建议了一个编程方法:它独立预测何时进行 KB 查询( 查询位置预测员), 然后预测在预测的位置( 查询预测者) 进行 KB 查询, 然后在随后的对话中使用预测的查询结果, 我们的预测结果, 我们的研究重点分析。