Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data. However, for complex multimodal information and sparse training data, it is usually difficult to achieve interpretability and high accuracy simultaneously for most methods. To address this difficulty, a new model is developed in this paper, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM). First, a multi-modal fine-grained fusion method is proposed, and Vgg16 and Optical Character Recognition (OCR) techniques are adopted to effectively extract text information from images and images. Then, the knowledge graph link prediction task is modelled as an offline reinforcement learning Markov decision model, which is then abstracted into a unified sequence framework. An interactive perception-based reward expectation mechanism and a special causal masking mechanism are designed, which "converts" the query into an inference path. Then, an autoregressive dynamic gradient adjustment mechanism is proposed to alleviate the insufficient problem of multimodal optimization. Finally, two datasets are adopted for experiments, and the popular SOTA baselines are used for comparison. The results show that the developed IMKGA-SM achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes.
翻译:然而,对于复杂的多式联运信息和稀少的培训数据,通常很难同时实现多数方法的可解释性和高准确性。为解决这一困难,本文件开发了一个新的模型,即通过序列模型(IMKGA-SM)对多模式知识图解的预测。首先,提出了一种多模式细微聚合法,并采用了Vgg16和光学字符识别技术,以有效地从图像和图像中提取文本信息。然后,知识图将预测任务模拟成一个离线强化学习Markov决定模型,然后将其抽象成一个统一的序列框架。设计了一个交互式的基于感知的预期机制和一个特殊的因果遮掩机制,“将查询转换成推导路径”。然后,提出了一种自动递增动性梯度调整机制,以缓解多式联运优化的不足问题。最后,为实验采用了两个数据集,并使用了广受欢迎的SOTA基线,用于比较。一个基于感知的SOA-SA级模型,其结果显示在多模式上实现不同程度的成绩基准。