Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
翻译:以图形为中心的人工智能(人工智能)在模拟从生物学动态系统到粒子物理学等自然界普遍存在的相互作用系统方面取得了显著的成功。数据日益多样化需要图形神经结构,这种结构可以结合多种感应偏差。然而,将各种来源的数据结合起来具有挑战性,因为适当的感应偏差可能因数据模式而异。多模式学习方法结合了多种数据模式,同时利用跨模式依赖来应对这一挑战。在这里,我们调查了以图形为中心的AI中的140项研究,并意识到多种数据类型越来越多地使用图表汇集在一起,并被输入复杂的多式联运模型。这些模型分为图像、语言和基于知识的多式联运学习。我们提出了基于这一分类的多式联运图形学习的算法蓝图。该蓝图作为将处理多式联运数据的先进结构分组的一种方法,通过适当选择四个不同的组成部分。这一努力可以为设计复杂现实世界问题的复杂多式联运结构的标准化铺平了道路。