Models for question answering, dialogue agents, and summarization often interpret the meaning of a sentence in a rich context and use that meaning in a new context. Taking excerpts of text can be problematic, as key pieces may not be explicit in a local window. We isolate and define the problem of sentence decontextualization: taking a sentence together with its context and rewriting it to be interpretable out of context, while preserving its meaning. We describe an annotation procedure, collect data on the Wikipedia corpus, and use the data to train models to automatically decontextualize sentences. We present preliminary studies that show the value of sentence decontextualization in a user facing task, and as preprocessing for systems that perform document understanding. We argue that decontextualization is an important subtask in many downstream applications, and that the definitions and resources provided can benefit tasks that operate on sentences that occur in a richer context.
翻译:用于回答问题的模型、对话媒介和总结往往在丰富的背景中解释一个句子的含义,并在新的背景下使用该词的含义。 摘取文本的节选可能会有问题,因为关键部分可能无法在本地窗口中明确。 我们孤立并界定了句子去通化问题:在保留其含义的同时,将句子及其上下文一起并重写成可以从上下文中解释的句子,我们描述一个批注程序,收集关于维基百科的数据,并利用数据来培训模型,自动解写句子。 我们提出初步研究,显示在用户面临任务时,以及作为进行文件理解的系统预处理的句子,将句子解脱脱。 我们争论说,在很多下游应用中,解通化是一个重要的子任务,所提供的定义和资源可以有利于在较富裕的背景下执行的判决任务。