Hyperlinks constitute the backbone of the Web; they enable user navigation, information discovery, content ranking, and many other crucial services on the Internet. In particular, hyperlinks found within Wikipedia allow the readers to navigate from one page to another to expand their knowledge on a given subject of interest or to discover a new one. However, despite Wikipedia editors' efforts to add and maintain its content, the distribution of links remains sparse in many language editions. This paper introduces a machine-in-the-loop entity linking system that can comply with community guidelines for adding a link and aims at increasing link coverage in new pages and wiki-projects with low-resources. To tackle these challenges, we build a context and language agnostic entity linking model that combines data collected from millions of anchors found across wiki-projects, as well as billions of users' reading sessions. We develop an interactive recommendation interface that proposes candidate links to editors who can confirm, reject, or adapt the recommendation with the overall aim of providing a more accessible editing experience for newcomers through structured tasks. Our system's design choices were made in collaboration with members of several language communities. When the system is implemented as part of Wikipedia, its usage by volunteer editors will help us build a continuous evaluation dataset with active feedback. Our experimental results show that our link recommender can achieve a precision above 80% while ensuring a recall of at least 50% across 6 languages covering different sizes, continents, and families.
翻译:超链接构成了网络的主干线; 它们能促进用户导航、信息发现、内容排序和互联网上许多其他关键服务。 特别是, 维基百科中发现超链接, 使读者能够从一个页面向另一个页面浏览, 以扩大对某个特定感兴趣的主题的知识, 或发现一个新的主题。 然而, 尽管维基百科编辑努力增加并保持其内容, 链接的分布在许多语言版本中仍然很少。 本文引入了一个机器即行链接实体, 该系统可以符合社区指南, 添加链接, 目的是增加新网页和维基项目与低资源的联系覆盖面。 为了应对这些挑战, 我们建立了一个背景和语言不可知性实体, 将从维基项目中收集的数百万根基数据以及数十亿个用户阅读会议收集的数据结合起来。 我们开发了一个互动式建议界面, 向能够确认、 拒绝或调整建议的编辑链接, 其总体目标是通过结构化任务为新来者提供更方便的编辑经验。 我们的系统的设计选择是与若干语言社区的成员合作做出的。 为了应对这些挑战, 我们的系统将链接连接到我们80个志愿者的用户的链接, 将持续使用。