Graphs are versatile tools for representing structured data. As a result, a variety of machine learning methods have been studied for graph data analysis. Although many such learning methods depend on the measurement of differences between input graphs, defining an appropriate distance metric for graphs remains a controversial issue. Hence, we propose a supervised distance metric learning method for the graph classification problem. Our method, named interpretable graph metric learning (IGML), learns discriminative metrics in a subgraph-based feature space, which has a strong graph representation capability. By introducing a sparsity-inducing penalty on the weight of each subgraph, IGML can identify a small number of important subgraphs that can provide insight into the given classification task. Because our formulation has a large number of optimization variables, an efficient algorithm that uses pruning techniques based on safe screening and working set selection methods is also proposed. An important property of IGML is that solution optimality is guaranteed because the problem is formulated as a convex problem and our pruning strategies only discard unnecessary subgraphs. Furthermore, we show that IGML is also applicable to other structured data such as itemset and sequence data, and that it can incorporate vertex-label similarity by using a transportation-based subgraph feature. We empirically evaluate the computational efficiency and classification performance of IGML on several benchmark datasets and provide some illustrative examples of how IGML identifies important subgraphs from a given graph dataset.
翻译:图表是代表结构化数据的多用途工具。因此,对图表数据分析,已经研究了各种机器学习方法。虽然许多这类学习方法取决于对输入图表差异的测量,但确定图表的适当距离度仍是一个有争议的问题。因此,我们提议了一种监督的图形分类问题的远程计量学习方法。我们的方法,命名为可解释的图形计量学习(IGML),在一个基于子图的特征空间中学习歧视性指标,该地貌空间具有很强的图形代表能力。通过对每个子图的重量引入一种宽度诱导惩罚,IGML可以确定少量重要的子图谱,能够提供对特定分类任务的洞察力。因为我们的配方有大量优化变量,一种高效的算法,在安全筛选和工作集选择方法的基础上使用剪裁技术。IGML的一个重要属性是,解决方案的最佳性是,因为问题被发展成一个共振素问题,而我们的曲线战略只抛弃不必要的子图。此外,我们表明,IGML也可以应用其他结构化数据,例如项目设置和精确度的分级数据序列,我们可以通过一系列的亚化数据分析模型来评估。