Open information extraction is an important NLP task that targets extracting structured information from unstructured text without limitations on the relation type or the domain of the text. This survey paper covers open information extraction technologies from 2007 to 2022 with a focus on new models not covered by previous surveys. We propose a new categorization method from the source of information perspective to accommodate the development of recent OIE technologies. In addition, we summarize three major approaches based on task settings as well as current popular datasets and model evaluation metrics. Given the comprehensive review, several future directions are shown from datasets, source of information, output form, method, and evaluation metric aspects.
翻译:开放信息提取是一项重要的国家清单任务,目标是从无结构文本中提取结构化信息,不受关系类型或文本领域的限制,本调查文件涵盖2007年至2022年的开放信息提取技术,重点是前几次调查未涵盖的新模式。我们从信息来源的角度提出新的分类方法,以适应国际兽疫局最近技术的发展。此外,我们根据任务设置以及当前流行的数据集和示范评价指标总结了三个主要方法。鉴于全面审查,从数据集、信息来源、产出形式、方法和评价衡量标准等方面可以看出若干未来方向。