Wikidata is an open knowledge graph built by a global community of volunteers. As it advances in scale, it faces substantial challenges around editor engagement. These challenges are in terms of both attracting new editors to keep up with the sheer amount of work and retaining existing editors. Experience from other online communities and peer-production systems, including Wikipedia, suggests that personalised recommendations could help, especially newcomers, who are sometimes unsure about how to contribute best to an ongoing effort. For this reason, we propose a recommender system WikidataRec for Wikidata items. The system uses a hybrid of content-based and collaborative filtering techniques to rank items for editors relying on both item features and item-editor previous interaction. A neural network, named a neural mixture of representations, is designed to learn fine weights for the combination of item-based representations and optimize them with editor-based representation by item-editor interaction. To facilitate further research in this space, we also create two benchmark datasets, a general-purpose one with 220,000 editors responsible for 14 million interactions with 4 million items and a second one focusing on the contributions of more than 8,000 more active editors. We perform an offline evaluation of the system on both datasets with promising results. Our code and datasets are available at https://github.com/WikidataRec-developer/Wikidata_Recommender.
翻译:维基数据是一个全球志愿者团体建立的开放知识图,随着规模的发展,它面临着编辑参与方面的巨大挑战。这些挑战既包括吸引新编辑跟上大量工作,也包括保留现有编辑。来自其他在线社区和同行制作系统(包括维基百科)的经验表明,个人化的建议可以帮助,特别是新来者,他们有时对如何为正在进行的工作作出最佳贡献缺乏把握。为此原因,我们提议为维基数据项目建立一个推荐系统维基数据。该系统使用基于内容和协作过滤技术的混合组合,根据项目特性和项目编辑先前的互动,对编辑项目进行排名。一个神经网络,命名为神经组合,目的是学习基于项目的表现组合的精细权重,并通过基于项目编辑的表达方式优化这些建议。为了便利这一空间的进一步研究,我们还创建了两个基准数据集,一个通用数据集,由220 000名编辑负责与400万个项目进行互动,第二个网络侧重于8 000多个动态编辑/更活跃的版本。我们在数据库中进行一个有希望的数据的版本的版本。