The Resource Description Framework (RDF) is a framework for describing metadata, such as attributes and relationships of resources on the Web. Machine learning tasks for RDF graphs adopt three methods: (i) support vector machines (SVMs) with RDF graph kernels, (ii) RDF graph embeddings, and (iii) relational graph convolutional networks. In this paper, we propose a novel feature vector (called a Skip vector) that represents some features of each resource in an RDF graph by extracting various combinations of neighboring edges and nodes. In order to make the Skip vector low-dimensional, we select important features for classification tasks based on the information gain ratio of each feature. The classification tasks can be performed by applying the low-dimensional Skip vector of each resource to conventional machine learning algorithms, such as SVMs, the k-nearest neighbors method, neural networks, random forests, and AdaBoost. In our evaluation experiments with RDF data, such as Wikidata, DBpedia, and YAGO, we compare our method with RDF graph kernels in an SVM. We also compare our method with the two approaches: RDF graph embeddings such as RDF2vec and relational graph convolutional networks on the AIFB, MUTAG, BGS, and AM benchmarks.
翻译:资源描述框架(RDF) 是描述元数据,例如网络上资源属性和关系关系的框架。 RDF 图的机器学习任务采用三种方法:(一) 支持矢量机(SVMS),使用RDF图形内核,(二) RDF图形嵌入,(三) 关联图相图连动网络。在本文中,我们提出了一个新颖的特性矢量(称为“跳过矢量”),通过提取相邻边缘和节点的各种组合,在RDF图中代表每个资源的一些特征。为了使跳过矢量低度,我们根据每个特性的信息增益率选择了分类任务的重要特征。分类任务可以通过将每种资源的低维量矢量矢量载量机(SVM)应用到常规的机器学习算法,如SVM、K最近邻方法、神经网络、随机森林和AdaBoost。我们用RDFFS数据进行的评估实验时,我们的方法与RDFS图表中的RK内核网络进行了对比,例如SVMM。我们用两个图表式的RDFS。