Machine learning models that learn from dynamic graphs face nontrivial challenges in learning and inference as both nodes and edges change over time. The existing large-scale graph benchmark datasets that are widely used by the community primarily focus on homogeneous node and edge attributes and are static. In this work, we present a variety of large scale, dynamic heterogeneous academic graphs to test the effectiveness of models developed for multi-step graph forecasting tasks. Our novel datasets cover both context and content information extracted from scientific publications across two communities: Artificial Intelligence (AI) and Nuclear Nonproliferation (NN). In addition, we propose a systematic approach to improve the existing evaluation procedures used in the graph forecasting models.
翻译:从动态图表中学习的机床学习模型在学习和推论方面面临着非技术性挑战,因为节点和边缘随时间变化而变化。社区广泛使用的现有大型图表基准数据集主要侧重于同质节点和边缘属性,是静态的。在这项工作中,我们提出各种大规模、动态的多元学术图表,以测试为多步骤图表预测任务而开发的模型的有效性。我们的新数据集涵盖两个社区从科学出版物(人工智能和核不扩散)中摘取的背景和内容信息。此外,我们建议采取系统办法改进图表预测模型中使用的现有评价程序。