利用先期研究对实体差异化进行实体差异化的强力评价:实体重叠案例 (Robustness Evaluation of Entity Disambiguation Using Prior Probes:the Case of Entity Overshadowing) - 专知论文

会员服务 ·

0

entity · 命名实体消歧 · 先验概率 · Performer · CASE ·

2021 年 8 月 24 日

Robustness Evaluation of Entity Disambiguation Using Prior Probes:the Case of Entity Overshadowing

翻译：利用先期研究对实体差异化进行实体差异化的强力评价:实体重叠案例

Vera Provatorova,Svitlana Vakulenko,Samarth Bhargav,Evangelos Kanoulas

Entity disambiguation (ED) is the last step of entity linking (EL), when candidate entities are reranked according to the context they appear in. All datasets for training and evaluating models for EL consist of convenience samples, such as news articles and tweets, that propagate the prior probability bias of the entity distribution towards more frequently occurring entities. It was previously shown that the performance of the EL systems on such datasets is overestimated since it is possible to obtain higher accuracy scores by merely learning the prior. To provide a more adequate evaluation benchmark, we introduce the ShadowLink dataset, which includes 16K short text snippets annotated with entity mentions. We evaluate and report the performance of popular EL systems on the ShadowLink benchmark. The results show a considerable difference in accuracy between more and less common entities for all of the EL systems under evaluation, demonstrating the effects of prior probability bias and entity overshadowing.

翻译：实体偏差(ED)是实体联系的最后一步,当候选实体根据它们所处的背景重新排序时,实体的偏差(EL)是实体联系的最后一步。所有用于培训和评价EL模型的数据集都包含方便样本,例如新闻文章和推特,这些样本传播了实体先前向更频繁发生实体分布的概率偏差;以前曾显示,由于仅通过学习之前的学习就可以获得更高的准确度分数,EL系统在这类数据集上的性能被高估过高。为了提供更充分的评估基准,我们引入了“阴影链接”数据集,其中包括16K短文本,与实体一起附加注释的16K条文字片段。我们评估和报告在阴影链接基准上流行的EL系统的性能。结果显示,评价中的所有EL系统的通用实体在准确性上存在相当大的差异,表明先前概率偏差和实体蒙上阴影的影响。

0

相关内容

entity

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【O'Reilly AI Conference 2019】目标消费者在多个领域的受众预测：NER和贝叶斯方法（Audience projection of target consumers over multiple domains: A NER and Bayesian approach），Helixa的首席科学家兼AI负责人Gianmario Spacagna

【O'Reilly AI Conference 2019】目标消费者在多个领域的受众预测：NER和贝叶斯方法（Audience projection of target consumers over multiple domains: A NER and Bayesian approach），Helixa的首席科学家兼AI负责人Gianmario Spacagna

专知会员服务

5+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

Arxiv

1+阅读 · 2021年10月15日

To Protect and To Serve? Analyzing Entity-Centric Framing of Police Violence

Arxiv

0+阅读 · 2021年9月11日

Towards Improving Adversarial Training of NLP Models

Arxiv

0+阅读 · 2021年9月11日

Template-Based Named Entity Recognition Using BART

Arxiv

5+阅读 · 2021年6月3日

Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking

Arxiv

11+阅读 · 2021年1月7日

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Arxiv

8+阅读 · 2020年3月3日

Joint Learning of Named Entity Recognition and Entity Linking

Arxiv

3+阅读 · 2019年7月18日

Correlated discrete data generation using adversarial training

Arxiv

5+阅读 · 2018年4月3日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

PEYMA: A Tagged Corpus for Persian Named Entities

Arxiv

5+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

命名实体消歧

相关VIP内容

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【O'Reilly AI Conference 2019】目标消费者在多个领域的受众预测：NER和贝叶斯方法（Audience projection of target consumers over multiple domains: A NER and Bayesian approach），Helixa的首席科学家兼AI负责人Gianmario Spacagna

【O'Reilly AI Conference 2019】目标消费者在多个领域的受众预测：NER和贝叶斯方法（Audience projection of target consumers over multiple domains: A NER and Bayesian approach），Helixa的首席科学家兼AI负责人Gianmario Spacagna

专知会员服务

5+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【普林斯顿博士论文】在线学习：优化、控制与学习理论

不确定环境下无人机三维路径规划研究 | 221页

【NeurIPS2025】《LeapFactual：基于条件流匹配的可靠视觉反事实解释》

大语言模型将如何改变军事指挥结构

相关资讯

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

Arxiv

1+阅读 · 2021年10月15日

To Protect and To Serve? Analyzing Entity-Centric Framing of Police Violence

Arxiv

0+阅读 · 2021年9月11日

Towards Improving Adversarial Training of NLP Models

Arxiv

0+阅读 · 2021年9月11日

Template-Based Named Entity Recognition Using BART

Arxiv

5+阅读 · 2021年6月3日

Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking

Arxiv

11+阅读 · 2021年1月7日

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Arxiv

8+阅读 · 2020年3月3日

Joint Learning of Named Entity Recognition and Entity Linking

Arxiv

3+阅读 · 2019年7月18日

Correlated discrete data generation using adversarial training

Arxiv

5+阅读 · 2018年4月3日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

PEYMA: A Tagged Corpus for Persian Named Entities

Arxiv

5+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员