Open Information Extraction (OIE) systems seek to compress the factual propositions of a sentence into a series of n-ary tuples. These tuples are useful for downstream tasks in natural language processing like knowledge base creation, textual entailment, and natural language understanding. However, current OIE datasets are limited in both size and diversity. We introduce a new dataset by converting the QA-SRL 2.0 dataset to a large-scale OIE dataset (LSOIE). Our LSOIE dataset is 20 times larger than the next largest human-annotated OIE dataset. We construct and evaluate several benchmark OIE models on LSOIE, providing baselines for future improvements on the task. Our LSOIE data, models, and code are made publicly available
翻译:开放信息提取系统(OIE)试图将句子的事实主张压缩为一系列n-ary tuples。这些图例有助于自然语言处理的下游任务,例如知识库的创建、文本要求和自然语言理解。然而,目前OIE的数据集在大小和多样性方面都有限。我们通过将QA-SRL2.0数据集转换为大规模OIE(LSOIE)数据集引入一个新的数据集。我们的LSOIE数据集比下一个最大的人类附加说明的OIE数据集大20倍。我们为LSOIE建立和评估了若干基准的OIE模型,为今后改进任务提供了基准。我们LSOIE的数据、模型和代码被公开。