Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics. This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.
翻译:对世界实体及其关系具有全面知识的设备安装机器是AI的一项长期目标。在过去的十年中,大型知识库(又称知识图)从网络内容和文本源中自动建立,成为搜索引擎的关键资产。这种机器知识可用于对新闻、社交媒体和网络表格中的文字短语进行语义解释,有助于回答问题、自然语言处理和数据分析。本文章调查创建和整理大型知识库的基本概念和实用方法。它涵盖发现和收集实体及其语义类型并将它们组织成清洁分类的模型和方法。除此之外,文章讨论了以实体为中心的特性的自动提取。为了支持长期生命周期和机器知识的质量保证,文章介绍了构建开放的体系和知识整理的方法。关于学术项目和工业知识图表的个案研究是对概念和方法调查的补充。