Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fourteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike self-supervised learning.
翻译:图属性预测任务非常重要且繁多。虽然每个任务只提供少量标记示例,但未标记的图形已经从各种来源以大规模收集到了。传统的方法是使用自监督任务对未标记图进行模型训练,然后在预测任务上对模型进行微调。然而,自监督任务知识可能无法对齐或有时与预测所需的知识相冲突。本文提出了一种从大量未标记图中提取知识的方法,作为每个属性预测模型的特定有用数据点集。我们使用扩散模型充分利用未标记图,并设计了两个新目标来指导模型的去噪过程,使其生成任务特定的图形示例和标签。实验表明,我们的数据中心方法在 15 个任务上比14种现有不同方法表现显着优异。由未标记数据带来的性能提升可见生成的标记示例,不像自监督学习。