The rise of personal assistants has made conversational question answering (ConvQA) a very popular mechanism for user-system interaction. State-of-the-art methods for ConvQA over knowledge graphs (KGs) can only learn from crisp question-answer pairs found in popular benchmarks. In reality, however, such training data is hard to come by: users would rarely mark answers explicitly as correct or wrong. In this work, we take a step towards a more natural learning paradigm - from noisy and implicit feedback via question reformulations. A reformulation is likely to be triggered by an incorrect system response, whereas a new follow-up question could be a positive signal on the previous turn's answer. We present a reinforcement learning model, termed CONQUER, that can learn from a conversational stream of questions and reformulations. CONQUER models the answering process as multiple agents walking in parallel on the KG, where the walks are determined by actions sampled using a policy network. This policy network takes the question along with the conversational context as inputs and is trained via noisy rewards obtained from the reformulation likelihood. To evaluate CONQUER, we create and release ConvRef, a benchmark with about 11k natural conversations containing around 205k reformulations. Experiments show that CONQUER successfully learns to answer conversational questions from noisy reward signals, significantly improving over a state-of-the-art baseline.
翻译:个人助理的崛起使得对口问题回答(ConvQA)成为用户系统互动的一个非常受欢迎的机制。ConvQA对于知识图表(KGs)来说,最先进的ConvQA方法只能从流行基准中发现的对答中学习。然而,在现实中,这种培训数据很难找到:用户很少把答案明确写成正确或错误。在这项工作中,我们迈出了一步,向更自然的学习范式迈进——从通过问题重拟的吵闹和隐含的反馈到一个更自然的学习范式。系统反应不正确可能会引发重新的重整,而新的后续问题可能是前一转答案的一个积极信号。我们展示了一个强化学习模式,称为CONQUER,能够从问题和重编的对口流中学习。CONQUER将回答过程作为多个代理人在KGG的平行行走,通过一个政策网络的抽样行动来决定。这个政策网络将问题与对话背景放在一个问题一起,通过重新拟订可能性获得的模糊的奖赏来触发。我们提出了一个强化的强化学习模式,我们创建了CONBRiral Bal Balalalalal的对2025进行关于BRal的升级的升级的升级的对质的学习。