Frequent and structurally related subgraphs, also known as network motifs, are valuable features of many graph datasets. However, the high computational complexity of identifying motif sets in arbitrary datasets (motif mining) has limited their use in many real-world datasets. By automatically leveraging statistical properties of datasets, machine learning approaches have shown promise in several tasks with combinatorial complexity and are therefore a promising candidate for network motif mining. In this work we seek to facilitate the development of machine learning approaches aimed at motif mining. We propose a formulation of the motif mining problem as a node labelling task. In addition, we build benchmark datasets and evaluation metrics which test the ability of models to capture different aspects of motif discovery such as motif number, size, topology, and scarcity. Next, we propose MotiFiesta, a first attempt at solving this problem in a fully differentiable manner with promising results on challenging baselines. Finally, we demonstrate through MotiFiesta that this learning setting can be applied simultaneously to general-purpose data mining and interpretable feature extraction for graph classification tasks.
翻译:经常和结构相关的子集(又称网络图案)是许多图表数据集的宝贵特征。然而,在任意数据集(motif 采矿)中,识别图案集的计算复杂性很高,限制了它们在许多真实世界数据集中的使用。通过自动利用数据集的统计特性,机器学习方法在组合复杂性的若干任务中显示出了希望,因此是网络图案采矿的一个有希望的候选对象。在这项工作中,我们寻求促进开发旨在图案采矿的机器学习方法。我们提议将模型采矿问题作为节点标签任务。此外,我们建立基准数据集和评价指标,测试模型在采集模型发现的不同方面的能力,例如模型数字、大小、表理学和稀缺性。我们提出莫蒂菲斯塔,这是首次尝试以完全不同的方式解决这一问题,在具有挑战性的基线上取得有希望的结果。最后,我们通过莫蒂菲斯塔展示,这一学习设置可以同时用于一般数据开采和可解释的图形分类特征提取任务。