Graph classification, which aims to identify the category labels of graphs, plays a significant role in drug classification, toxicity detection, protein analysis etc. However, the limitation of scale in the benchmark datasets makes it easy for graph classification models to fall into over-fitting and undergeneralization. To improve this, we introduce data augmentation on graphs (i.e. graph augmentation) and present four methods:random mapping, vertex-similarity mapping, motif-random mapping and motif-similarity mapping, to generate more weakly labeled data for small-scale benchmark datasets via heuristic transformation of graph structures. Furthermore, we propose a generic model evolution framework, named M-Evolve, which combines graph augmentation, data filtration and model retraining to optimize pre-trained graph classifiers. Experiments on six benchmark datasets demonstrate that the proposed framework helps existing graph classification models alleviate over-fitting and undergeneralization in the training on small-scale benchmark datasets, which successfully yields an average improvement of 3 - 13% accuracy on graph classification tasks.
翻译:图表分类的目的是确定图表的类别标签,在药物分类、毒性检测、蛋白质分析等方面起着重要作用。然而,基准数据集的规模限制使得图表分类模型很容易被过分和笼统地使用。为了改进这一点,我们在图表(即图增量)上引入数据增强,并提出了四种方法:随机图绘制、顶点相似性绘图、motif-ranomomom映像和motif-plio-pliography映像,以便通过图结构的超光化转换为小规模基准数据集生成更微弱的标签数据。此外,我们提议了一个通用模型演变框架,名为M-Evollev,将图形增强、数据过滤和模型再培训相结合,以优化经过预先训练的图形分类人员。对六个基准数据集的实验表明,拟议的框架有助于现有图表分类模型在小规模基准数据集培训中减轻过大和未充分概括化,从而成功地使图表分类任务平均提高了3-13%的精确度。