Entity Linking (EL) is the gateway into Knowledge Bases. Recent advances in EL utilize dense retrieval approaches for Candidate Generation, which addresses some of the shortcomings of the Lookup based approach of matching NER mentions against pre-computed dictionaries. In this work, we show that in the domain of Tweets, such methods suffer as users often include informal spelling, limited context, and lack of specificity, among other issues. We investigate these challenges on a large and recent Tweets benchmark for EL, empirically evaluate lookup and dense retrieval approaches, and demonstrate a hybrid solution using long contextual representation from Wikipedia is necessary to achieve considerable gains over previous work, achieving 0.93 recall.
翻译:实体链接(EL)是进入知识库的门户。最近EL的进展为候选人产生采用了密集的检索方法,解决了基于调查的匹配NER方法与预先计算词典之间的某些缺点。在这项工作中,我们表明,在Tweets领域,这类方法由于用户通常包括非正式拼写、有限背景和缺乏具体性等问题而受到影响。我们根据EL的大型和近期Tweets基准调查这些挑战,对查找和密集检索方法进行经验评估,并展示使用维基百科的长期背景说明的混合解决办法是必要的,这样才能在以往工作中取得相当大的进展,实现0.93的回顾。