对 Weisfeiler-Lehman 的图形内核来说学习亚树模式的重要性 (Learning subtree pattern importance for Weisfeiler-Lehmanbased graph kernels)

Graph is an usual representation of relational data, which are ubiquitous in manydomains such as molecules, biological and social networks. A popular approach to learningwith graph structured data is to make use of graph kernels, which measure the similaritybetween graphs and are plugged into a kernel machine such as a support vector machine.Weisfeiler-Lehman (WL) based graph kernels, which employ WL labeling scheme to extract subtree patterns and perform node embedding, are demonstrated to achieve great performance while being efficiently computable. However, one of the main drawbacks of ageneral kernel is the decoupling of kernel construction and learning process. For moleculargraphs, usual kernels such as WL subtree, based on substructures of the molecules, consider all available substructures having the same importance, which might not be suitable inpractice. In this paper, we propose a method to learn the weights of subtree patterns in the framework of WWL kernels, the state of the art method for graph classification task [14]. To overcome the computational issue on large scale data sets, we present an efficient learning algorithm and also derive a generalization gap bound to show its convergence. Finally, through experiments on synthetic and real-world data sets, we demonstrate the effectiveness of our proposed method for learning the weights of subtree patterns.

翻译：图表是关系数据的一种通常的表达方式,它存在于分子、生物和社会网络等许多领域。用图表结构化数据进行学习的流行方法之一是使用图形内核,以测量图形之间的相似性,并插入内核机器,如支持矢量机。Weisfeiler-Lehman(WL)基于图形内核的图形内核,采用WL标签办法提取亚树模式,并进行节点嵌入,从而在高效率的可比较性下取得巨大的性能。然而,一般内核的主要缺点之一是将内核构造和学习过程的内核分离。对于分子图,通常的内核,如WL子树,以分子的亚结构为基础,考虑所有具有同等重要性的子结构,这也许不适宜于实践。在本文件中,我们提出了一种方法,用以在WWWL内核框架内了解亚树型模式的重量,总内核内核部分的重量,总内核部分的内核状态是内核构造的内核构造构造结构的脱线构造的脱钩结构,其内积结构的重量状态,也展示了我们当前图表内层数据统化方法的细化,最终的矩阵分析。展示了我们目前图表分类的研算法的模型的模型的模型的研算方法,最后的研算方法的研算的研算。