Supervised learning, while prevalent for information cascade modeling, often requires abundant labeled data in training, and the trained model is not easy to generalize across tasks and datasets. It often learns task-specific representations, which can easily result in overfitting for downstream tasks. Recently, self-supervised learning is designed to alleviate these two fundamental issues in linguistic and visual tasks. However, its direct applicability for information cascade modeling, especially graph cascade related tasks, remains underexplored. In this work, we present Contrastive Cascade Graph Learning (CCGL), a novel framework for information cascade graph learning in a contrastive, self-supervised, and task-agnostic way. In particular, CCGL first designs an effective data augmentation strategy to capture variation and uncertainty by simulating the information diffusion in graphs. Second, it learns a generic model for graph cascade tasks via self-supervised contrastive pre-training using both unlabeled and labeled data. Third, CCGL learns a task-specific cascade model via fine-tuning using labeled data. Finally, to make the model transferable across datasets and cascade applications, CCGL further enhances the model via distillation using a teacher-student architecture. We demonstrate that CCGL significantly outperforms its supervised and semi-supervised counterparts for several downstream tasks.
翻译:受监督的学习虽然在信息级联模式中很普遍,但往往需要大量在培训中提供贴标签的数据,而经过培训的模型则不容易在任务和数据集中加以推广。它常常学习任务特定的表达方式,这很容易导致对下游任务的过度配置。最近,自我监督的学习旨在缓解语言和视觉任务中的这两个基本问题。然而,它直接适用于信息级联模型,特别是图表级联相关任务,在这项工作中,仍然没有得到充分利用。在这项工作中,我们介绍了反相连连带图表学习(CCGL),这是一个以对比性、自我监督和任务敏感方式学习信息级联星图的新框架。特别是,CCGL首先设计了有效的数据增强战略,通过模拟图表中的信息传播来捕捉变异性和不确定性。第二,它利用无标签和标签的数据进行进一步调整,学习了图形级联任务的一般模式。最后,利用模型将模型转换成一个跨数据级联校的模型,并演示了CG-CG-Sirdimal应用软件。