神经方法来进行实体中心信息提取 (Neural Approaches to Entity-Centric Information Extraction)

Artificial Intelligence (AI) has huge impact on our daily lives with applications such as voice assistants, facial recognition, chatbots, autonomously driving cars, etc. Natural Language Processing (NLP) is a cross-discipline of AI and Linguistics, dedicated to study the understanding of the text. This is a very challenging area due to unstructured nature of the language, with many ambiguous and corner cases. In this thesis we address a very specific area of NLP that involves the understanding of entities (e.g., names of people, organizations, locations) in text. First, we introduce a radically different, entity-centric view of the information in text. We argue that instead of using individual mentions in text to understand their meaning, we should build applications that would work in terms of entity concepts. Next, we present a more detailed model on how the entity-centric approach can be used for the entity linking task. In our work, we show that this task can be improved by considering performing entity linking at the coreference cluster level rather than each of the mentions individually. In our next work, we further study how information from Knowledge Base entities can be integrated into text. Finally, we analyze the evolution of the entities from the evolving temporal perspective.

翻译：人工智能在我们的日常生活中有着巨大的影响，例如语音助手、面部识别、聊天机器人、无人驾驶车辆等应用。自然语言处理是人工智能和语言学的交叉领域，致力于研究文本的理解。由于语言的非结构化性质，许多模糊和特殊情况，这是一个非常具有挑战性的领域。在这篇论文中，我们针对自然语言处理中一个非常特定的领域，即涉及文本实体（例如人名、组织机构、位置）的理解。首先，我们介绍一个完全不同的实体中心视角来理解文本中的信息。我们认为，与其使用单个文本提及来理解其含义，我们应该建立以实体概念为基础的应用程序。接下来，我们提出了一个更详细的模型，阐述了实体中心方法如何可以用于实体链接任务。在我们的工作中，我们展示了通过考虑在代词群体级别执行实体链接而不是每个提及单独执行来改善此任务。在我们的下一项工作中，我们进一步研究了如何将知识库实体的信息集成到文本中。最后，我们分析了从不断变化的时间角度看实体的演化。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【AAAI 2020】将深度学习与逻辑融合用于信息提取（Integrating Deep Learning with Logic Fusion for Information Extraction）

专知会员服务

66+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日