汉塞尔:中国少点热和零点热实体联系基准 (Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark)

Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.

翻译：现代实体链接(EL)系统强化了流行偏好,然而,除了英文之外,没有侧重于尾巴和新兴实体的数据集。我们展示了汉塞尔,这是中国新基准,可以填补非英语的少发和零发EL挑战。汉塞尔测试组是人类附加说明和审查的,以新颖的方法收集零发EL数据集。它涵盖新闻、社交媒体文章和其他网络文章中的10K种不同文件,维基数据是其目标知识库。我们显示,现有最先进的EL系统在汉塞尔上表现不佳(少发36.6%的R@1),然后我们建立了一个强大的基准,在我们的数据集上,小肖特的R@1和零热热的76.6%的R@1。我们还显示,我们的基线在TAC-KBP2015中国实体链接任务上取得了竞争性结果。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日