Questions asked by humans during a conversation often contain contextual dependencies, i.e., explicit or implicit references to previous dialogue turns. These dependencies take the form of coreferences (e.g., via pronoun use) or ellipses, and can make the understanding difficult for automated systems. One way to facilitate the understanding and subsequent treatments of a question is to rewrite it into an out-of-context form, i.e., a form that can be understood without the conversational context. We propose CoQAR, a corpus containing $4.5$K conversations from the Conversational Question-Answering dataset CoQA, for a total of $53$K follow-up question-answer pairs. Each original question was manually annotated with at least 2 at most 3 out-of-context rewritings. CoQAR can be used in the supervised learning of three tasks: question paraphrasing, question rewriting and conversational question answering. In order to assess the quality of CoQAR's rewritings, we conduct several experiments consisting in training and evaluating models for these three tasks. Our results support the idea that question rewriting can be used as a preprocessing step for question answering models, thereby increasing their performances.
翻译:在对话中,人问的问题往往包含背景依赖,即明确或隐含地提及以前的对话转折。这些依赖形式为共530K美元的后续问答配对,这些关联形式为共530K美元的后续问答配对。每个原始问题都是手动加注,至少有2个至少3个文字外写。 CoQAR可用于在监督下学习三项任务:问题解析、问题重写和谈话解答。为了评估CoQAR重写工作的质量,我们进行了数项实验,这些实验包括培训和评估这些任务所使用的模型。