Benchie: 基于事实的公开信息采掘评价,非 Tokens (BenchIE: Open Information Extraction Evaluation Based on Facts, Not Tokens) - 专知论文

会员服务 ·

0

INFORMS · 信息抽取 · Less · 词元分析器 · state-of-the-art ·

2021 年 9 月 14 日

BenchIE: Open Information Extraction Evaluation Based on Facts, Not Tokens

翻译：Benchie: 基于事实的公开信息采掘评价,非 Tokens

Kiril Gashteovski,Mingying Yu,Bhushan Kotnis,Carolin Lawrence,Goran Glavas,Mathias Niepert

Intrinsic evaluations of OIE systems are carried out either manually -- with human evaluators judging the correctness of extractions -- or automatically, on standardized benchmarks. The latter, while much more cost-effective, is less reliable, primarily because of the incompleteness of the existing OIE benchmarks: the ground truth extractions do not include all acceptable variants of the same fact, leading to unreliable assessment of models' performance. Moreover, the existing OIE benchmarks are available for English only. In this work, we introduce BenchIE: a benchmark and evaluation framework for comprehensive evaluation of OIE systems for English, Chinese and German. In contrast to existing OIE benchmarks, BenchIE takes into account informational equivalence of extractions: our gold standard consists of fact synsets, clusters in which we exhaustively list all surface forms of the same fact. We benchmark several state-of-the-art OIE systems using BenchIE and demonstrate that these systems are significantly less effective than indicated by existing OIE benchmarks. We make BenchIE (data and evaluation code) publicly available.

翻译：对国际兽疫局系统的内在评价是人工进行的 -- -- 由人类评价人员来判断提取的正确性 -- -- 或自动地根据标准化基准进行。标准化基准虽然成本效益高得多,但不太可靠,主要是因为国际兽疫局现有基准不完整:地面真相挖掘并不包括同一事实的所有可接受的变体,导致对模型性能的评估不可靠。此外,现有国际兽疫局基准只有英文可用。在这项工作中,我们引入了Bechie:一个全面评价英国、中国和德国国际兽疫局系统的基准和评价框架。与现有的国际兽疫局基准相比,国际兽疫局考虑了抽取的信息等同性:我们的黄金标准包括事实合成群,我们详尽地列出了同一事实的所有表面形式。我们用国际兽疫局基准对若干最新的国际兽疫局系统进行了基准,并表明这些系统比现有国际兽疫局基准显示的效用要小得多。我们公开提供国际兽疫局(数据和评价代码)。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

元学习与图神经网络逻辑推导，55页ppt

元学习与图神经网络逻辑推导，55页ppt

专知会员服务

129+阅读 · 2020年4月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

暗通沟渠：Multi-lingual Attention

暗通沟渠：Multi-lingual Attention

我爱读PAMI

7+阅读 · 2018年2月24日

揭开知识库问答KB-QA的面纱3·向量建模篇

揭开知识库问答KB-QA的面纱3·向量建模篇

PaperWeekly

8+阅读 · 2017年8月23日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Arxiv

0+阅读 · 2021年11月4日

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Arxiv

0+阅读 · 2021年11月4日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Fine-tune BERT for Extractive Summarization

Arxiv

3+阅读 · 2019年9月5日

Span Based Open Information Extraction

Arxiv

3+阅读 · 2019年3月1日

QA4IE: A Question Answering based Framework for Information Extraction

Arxiv

4+阅读 · 2019年1月28日

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Arxiv

7+阅读 · 2018年3月28日

Open Information Extraction on Scientific Text: An Evaluation

Arxiv

6+阅读 · 2018年2月15日

Assertion-based QA with Question-Aware Open Information Extraction

Arxiv

3+阅读 · 2018年1月23日

VIP会员

文章信息

相关主题

词元分析器

state-of-the-art

相关VIP内容

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

【优化基准：最佳实践，54页pdf】Benchmarking in Optimization: Best Practice and Open Issues

专知会员服务

25+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

元学习与图神经网络逻辑推导，55页ppt

元学习与图神经网络逻辑推导，55页ppt

专知会员服务

129+阅读 · 2020年4月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

暗通沟渠：Multi-lingual Attention

暗通沟渠：Multi-lingual Attention

我爱读PAMI

7+阅读 · 2018年2月24日

揭开知识库问答KB-QA的面纱3·向量建模篇

揭开知识库问答KB-QA的面纱3·向量建模篇

PaperWeekly

8+阅读 · 2017年8月23日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Arxiv

0+阅读 · 2021年11月4日

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Arxiv

0+阅读 · 2021年11月4日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Fine-tune BERT for Extractive Summarization

Arxiv

3+阅读 · 2019年9月5日

Span Based Open Information Extraction

Arxiv

3+阅读 · 2019年3月1日

QA4IE: A Question Answering based Framework for Information Extraction

Arxiv

4+阅读 · 2019年1月28日

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Arxiv

7+阅读 · 2018年3月28日

Open Information Extraction on Scientific Text: An Evaluation

Arxiv

6+阅读 · 2018年2月15日

Assertion-based QA with Question-Aware Open Information Extraction

Arxiv

3+阅读 · 2018年1月23日

微信扫码咨询专知VIP会员