We present RelSifter, a supervised learning approach to the problem of assigning relevance scores to triples expressing type-like relations such as 'profession' and 'nationality.' To provide additional contextual information about individuals and relations we supplement the data provided as part of the WSDM 2017 Triple Score contest with Wikidata and DBpedia, two large-scale knowledge graphs (KG). Our hypothesis is that any type relation, i.e., a specific profession like 'actor' or 'scientist,' can be described by the set of typical "activities" of people known to have that type relation. For example, actors are known to star in movies, and scientists are known for their academic affiliations. In a KG, this information is to be found on a properly defined subset of the second-degree neighbors of the type relation. This form of local information can be used as part of a learning algorithm to predict relevance scores for new, unseen triples. When scoring 'profession' and 'nationality' triples our experiments based on this approach result in an accuracy equal to 73% and 78%, respectively. These performance metrics are roughly equivalent or only slightly below the state of the art prior to the present contest. This suggests that our approach can be effective for evaluating facts, despite the skewness in the number of facts per individual mined from KGs.
翻译:我们向RelSifter展示了一种监督的学习方法, 即将相关性评分分配到表达类似类型关系的三倍的人, 如“ 专业” 和“ 民族” 。 为提供有关个人和关系的更多背景信息, 我们补充了作为WSDM 2017三分评分竞赛的一部分与维基数据 和 DBpedia 、 两个大型知识图表 (KG) 一起提供的数据。 我们的假设是, 任何类型的关系, 即特定行业, 如“ 行为体” 或“ 科学家 ”, 都可以通过已知有这种关系的人的典型“ 活动” 来描述。 例如, 演员在电影中是已知的, 科学家也因他们的学术属性而为已知的。 在KG 中, 这种信息可以被适当界定为该类型关系二度相邻的一组。 这种本地信息可以作为学习算法的一部分, 用来预测新的、 看不见的三分数。 当评分“ 专业” 和“ 国家” 的三倍, 我们根据这一方法进行的实验的结果相当于目前73% 和78 % 的准确性, 。 这些成绩只能代表我们的个人的成绩, 。