Conversational Question Answering (CQA) aims to answer questions contained within dialogues, which are not easily interpretable without context. Developing a model to rewrite conversational questions into self-contained ones is an emerging solution in industry settings as it allows using existing single-turn QA systems to avoid training a CQA model from scratch. Previous work trains rewriting models using human rewrites as supervision. However, such objectives are disconnected with QA models and therefore more human-like rewrites do not guarantee better QA performance. In this paper we propose using QA feedback to supervise the rewriting model with reinforcement learning. Experiments show that our approach can effectively improve QA performance over baselines for both extractive and retrieval QA. Furthermore, human evaluation shows that our method can generate more accurate and detailed rewrites when compared to human annotations.
翻译:对话问题解答(CQA)旨在回答对话中包含的问题,这些问题在没有背景的情况下不容易解释。开发将谈话问题改写成自足问题的模型是行业环境中的一种新兴解决方案,因为它允许使用现有的单转QA系统,避免从零开始培训CQA模型。先前的工作用人写作模式作为监督。然而,这些目标与QA模型脱节,因此,更像人一样的重写者不能保证质量A的性能。在本文中,我们建议使用QA反馈来通过强化学习来监督重写模式。实验表明,我们的方法可以有效地改善质量A在采掘和检索QA基线上的性能。此外,人类评估表明,与人类描述相比,我们的方法可以产生更准确和详细的重写。