Understanding the relations between entities denoted by NPs in text is a critical part of human-like natural language understanding. However, only a fraction of such relations is covered by NLP tasks and models nowadays. In this work, we establish the task of text-based NP enrichment (TNE), that is, enriching each NP with all the preposition-mediated relations that hold between this and the other NPs in the text. The relations are represented as triplets, each denoting two NPs linked via a preposition. Humans recover such relations seamlessly, while current state-of-the-art models struggle with them due to the implicit nature of the problem. We build the first large-scale dataset for the problem, provide the formal framing and scope of annotation, analyze the data, and report the result of fine-tuned neural language models on the task, demonstrating the challenge it poses to current technology. We created a webpage with the data, data-exploration UI, code, models, and demo to foster further research into this challenging text understanding problem at yanaiela.github.io/TNE/.
翻译:理解文本中NP表示的实体之间的关系是理解人种性自然语言的关键部分。然而,现在NLP的任务和模式只涵盖了这种关系的一小部分。在这项工作中,我们确定了基于文本的NP浓缩(TNE)的任务,即使每个NP丰富所有预发的、由预发调解的关系,这种关系在文本中保留着它与其他NP之间的关系。这种关系代表为三胞胎,每个预示着两个通过预示连接起来的NP。人类无缝地恢复了这种关系,而目前最先进的模式由于问题的隐含性质而与之斗争。我们为问题建立了第一个大规模的数据集,提供了正式的注解框架和范围,分析了数据,并报告了任务上经过微调的神经语言模型的结果,展示了它对目前技术构成的挑战。我们用数据、数据探索UII、代码、模型和演示建立了一个网页,以促进在yanailela.github.io/TNNE/DNE/演示中对这个富有挑战性的案文理解问题的进一步研究。