Artificial intelligence for graphs has achieved remarkable success in modeling complex systems, ranging from dynamic networks in biology to interacting particle systems in physics. However, the increasingly heterogeneous graph datasets call for multimodal methods that can combine different inductive biases: the set of assumptions that algorithms use to make predictions for inputs they have not encountered during training. Learning on multimodal datasets presents fundamental challenges because the inductive biases can vary by data modality and graphs might not be explicitly given in the input. To address these challenges, multimodal graph AI methods combine different modalities while leveraging cross-modal dependencies using graphs. Diverse datasets are combined using graphs and fed into sophisticated multimodal architectures, specified as image-intensive, knowledge-grounded and language-intensive models. Using this categorization, we introduce a blueprint for multimodal graph learning, use it to study existing methods and provide guidelines to design new models.
翻译:图表的人工智能在从生物动态网络到物理中相互作用的粒子系统等复杂系统的建模方面取得了显著的成功,但是,日益多样化的图表数据集要求采用多式方法,这些方法可以结合不同的感应偏差:算法用来预测培训期间没有遇到的投入的一套假设;多式数据集的学习提出了基本挑战,因为诱导偏差可能因数据模式而异,投入中可能没有明确说明。为了应对这些挑战,多式图表AI方法结合了不同的方式,同时利用图表利用跨模式依赖性。多样化数据集使用图表并被输入复杂的多式联运结构,即图像密集、知识基础和语言密集型模型。我们利用这种分类,提出了多式图表学习蓝图,使用它来研究现有方法,并提供设计新模型的指导方针。