We introduce a new open information extraction (OIE) benchmark for pre-trained language models (LM). Recent studies have demonstrated that pre-trained LMs, such as BERT and GPT, may store linguistic and relational knowledge. In particular, LMs are able to answer ``fill-in-the-blank'' questions when given a pre-defined relation category. Instead of focusing on pre-defined relations, we create an OIE benchmark aiming to fully examine the open relational information present in the pre-trained LMs. We accomplish this by turning pre-trained LMs into zero-shot OIE systems. Surprisingly, pre-trained LMs are able to obtain competitive performance on both standard OIE datasets (CaRB and Re-OIE2016) and two new large-scale factual OIE datasets (TAC KBP-OIE and Wikidata-OIE) that we establish via distant supervision. For instance, the zero-shot pre-trained LMs outperform the F1 score of the state-of-the-art supervised OIE methods on our factual OIE datasets without needing to use any training sets. Our code and datasets are available at https://github.com/cgraywang/IELM
翻译:我们为预先培训的语言模型引入了新的开放信息提取基准(OIE) 。 最近的研究表明,预先培训的LMS(如BERT和GPT)可能存储语言和关系知识。 特别是, LMS在给定一个预先界定的关系类别时能够回答“ 填充空白”的问题。 我们不是侧重于预先界定的关系,而是创建一个OIE基准,旨在充分审查预先培训的语言模型中存在的公开关系信息。 我们通过将预先培训的LMS转换为零弹 OIE系统来实现这一目标。 令人惊讶的是,预先培训的LMS能够在OIE的标准数据集(CARB和Re-OIE2016)和我们通过远程监督建立的两套新的大规模实际OIE数据集(TAC KBP-OIE和Wikidata-OIE)上取得竞争性性能。 例如, 零点培训前LMS超越了我们实际操作的OIEE/MRA的F1分。 不需要在OIE/ MAS 上使用任何培训的数据集。