Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data. However, for complex multimodal information and sparse training data, it is usually difficult to achieve interpretability and high accuracy simultaneously for most methods. To address this difficulty, a new model is developed in this paper, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM). First, a multi-modal fine-grained fusion method is proposed, and Vgg16 and Optical Character Recognition (OCR) techniques are adopted to effectively extract text information from images and images. Then, the knowledge graph link prediction task is modelled as an offline reinforcement learning Markov decision model, which is then abstracted into a unified sequence framework. An interactive perception-based reward expectation mechanism and a special causal masking mechanism are designed, which ``converts" the query into an inference path. Then, an autoregressive dynamic gradient adjustment mechanism is proposed to alleviate the insufficient problem of multimodal optimization. Finally, two datasets are adopted for experiments, and the popular SOTA baselines are used for comparison. The results show that the developed IMKGA-SM achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes.
翻译:然而,对于复杂的多式联运信息和稀少的培训数据,通常很难同时实现解释性和高准确性,而对于大多数方法而言,通常很难同时实现解释性和高准确性。为解决这一困难,本文件开发了一个新的模型,即通过序列模型(IMKGA-SM)对多模式知识图解的预测。首先,提出了一种多模式微微粒聚合法,并采用了Vgg16和光学字符识别技术,以有效地从图像和图像中提取文本信息。然后,知识图将预测任务模拟成一个离线强化学习Markov决定模型,然后将其制成一个统一的序列框架。设计了一种基于互动认识的奖励预期机制和一种特殊的因果遮掩机制,将查询转换为推导路径。然后,提出了一种自动递增性动态梯度调整机制,以缓解多式联运优化方面的不足问题。最后,为实验采用了两个数据集,并使用流行的SOTA基线作为模型,然后将其抽成一个统一的序列模型。一个互动的基于感知觉的奖励预期机制和特殊因果遮掩体机制,从而实现了不同的预测基线。