Artificial Intelligence (AI) has huge impact on our daily lives with applications such as voice assistants, facial recognition, chatbots, autonomously driving cars, etc. Natural Language Processing (NLP) is a cross-discipline of AI and Linguistics, dedicated to study the understanding of the text. This is a very challenging area due to unstructured nature of the language, with many ambiguous and corner cases. In this thesis we address a very specific area of NLP that involves the understanding of entities (e.g., names of people, organizations, locations) in text. First, we introduce a radically different, entity-centric view of the information in text. We argue that instead of using individual mentions in text to understand their meaning, we should build applications that would work in terms of entity concepts. Next, we present a more detailed model on how the entity-centric approach can be used for the entity linking task. In our work, we show that this task can be improved by considering performing entity linking at the coreference cluster level rather than each of the mentions individually. In our next work, we further study how information from Knowledge Base entities can be integrated into text. Finally, we analyze the evolution of the entities from the evolving temporal perspective.
翻译:人工智能在我们的日常生活中有着巨大的影响,例如语音助手、面部识别、聊天机器人、无人驾驶车辆等应用。自然语言处理是人工智能和语言学的交叉领域,致力于研究文本的理解。由于语言的非结构化性质,许多模糊和特殊情况,这是一个非常具有挑战性的领域。在这篇论文中,我们针对自然语言处理中一个非常特定的领域,即涉及文本实体(例如人名、组织机构、位置)的理解。首先,我们介绍一个完全不同的实体中心视角来理解文本中的信息。我们认为,与其使用单个文本提及来理解其含义,我们应该建立以实体概念为基础的应用程序。接下来,我们提出了一个更详细的模型,阐述了实体中心方法如何可以用于实体链接任务。在我们的工作中,我们展示了通过考虑在代词群体级别执行实体链接而不是每个提及单独执行来改善此任务。在我们的下一项工作中,我们进一步研究了如何将知识库实体的信息集成到文本中。最后,我们分析了从不断变化的时间角度看实体的演化。