Many state-of-the-art natural language understanding (NLU) models are based on pretrained neural language models. These models often make inferences using information from multiple sources. An important class of such inferences are those that require both background knowledge, presumably contained in a model's pretrained parameters, and instance-specific information that is supplied at inference time. However, the integration and reasoning abilities of NLU models in the presence of multiple knowledge sources have been largely understudied. In this work, we propose a test suite of coreference resolution tasks that require reasoning over multiple facts. Our dataset is organized into subtasks that differ in terms of which knowledge sources contain relevant facts. We evaluate state-of-the-art coreference resolution models on our dataset. Our results indicate that several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time. However, with task-specific training, a subset of models demonstrates the ability to integrate certain knowledge types from multiple sources.
翻译:许多最先进的自然语言理解模型(NLU)基于预先培训的神经语言模型。这些模型往往利用多种来源的信息进行推论。这类推论的一个重要类别是既需要背景知识(大概包含在模型预先培训的参数中),也需要在推论时提供的具体实例信息。然而,在多种知识来源的情形下,NLU模型的集成和推理能力基本上未得到充分研究。在这项工作中,我们提出了一套需要对多种事实进行推理的共同参照分辨率任务测试。我们的数据集被编成一个子任务,在知识来源包含相关事实方面各有不同。我们评估了我们数据集上的最新共同参照分辨率模型。我们的结果显示,有几个模型试图对在预入时和推论时观察到的知识进行理性思考。然而,通过任务培训,一组模型表明能够将多种来源的某些知识类型整合在一起。