The WSDM Cup 2017 was a data mining challenge held in conjunction with the 10th International Conference on Web Search and Data Mining (WSDM). It addressed key challenges of knowledge bases today: quality assurance and entity search. For quality assurance, we tackle the task of vandalism detection, based on a dataset of more than 82 million user-contributed revisions of the Wikidata knowledge base, all of which annotated with regard to whether or not they are vandalism. For entity search, we tackle the task of triple scoring, using a dataset that comprises relevance scores for triples from type-like relations including occupation and country of citizenship, based on about 10,000 human relevance judgements. For reproducibility sake, participants were asked to submit their software on TIRA, a cloud-based evaluation platform, and they were incentivized to share their approaches open source.
翻译:2017年WSDM杯是一个数据开采挑战,与第十届网络搜索和数据开采国际会议同时举行,它解决了当今知识基础的主要挑战:质量保证和实体搜索。为了质量保证,我们根据一套8 200多万用户贡献的对维基数据知识库的修改数据集,处理破坏行为探测任务,所有修改都附加说明,说明是否属于破坏行为。为了实体搜索,我们处理三重评分的任务,使用一套数据集,其中包括三重类型关系的相关性分数,包括占领和公民身份国等,其依据是大约10 000份与人类相关的判决。为了复制起见,我们请与会者在基于云的评价平台TIRA上提交软件,激励他们分享开放源的方法。