Multi-modal pre-training and knowledge discovery are two important research topics in multi-modal machine learning. Nevertheless, none of existing works make attempts to link knowledge discovery with knowledge guided multi-modal pre-training. In this paper, we propose to unify them into a continuous learning framework for mutual improvement. Taking the open-domain uni-modal datasets of images and texts as input, we maintain a knowledge graph as the foundation to support these two tasks. For knowledge discovery, a pre-trained model is used to identify cross-modal links on the graph. For model pre-training, the knowledge graph is used as the external knowledge to guide the model updating. These two steps are iteratively performed in our framework for continuous learning. The experimental results on MS-COCO and Flickr30K with respect to both knowledge discovery and the pre-trained model validate the effectiveness of our framework.
翻译:多模式预培训和知识发现是多模式机器学习的两个重要研究课题,然而,现有的工作没有一项试图将知识发现与知识引导的多模式预培训联系起来,在本文件中,我们提议将它们统一为一个不断学习的框架,以便相互改进。我们以图像和文本的开放式单模式数据集作为投入,维持一个知识图作为支持这两项任务的基础。关于知识发现,使用预先培训的模式来确定图上的跨模式链接。关于培训前模式,知识图被用作指导模式更新的外部知识。这两个步骤是在我们不断学习的框架内反复进行的。关于MS-CO和Flickr30K的实验结果,既证明了知识发现,也证明了我们框架的有效性。