A visual-relational knowledge graph (KG) is a multi-relational graph whose entities are associated with images. We introduce ImageGraph, a KG with 1,330 relation types, 14,870 entities, and 829,931 images. Visual-relational KGs lead to novel probabilistic query types where images are treated as first-class citizens. Both the prediction of relations between unseen images and multi-relational image retrieval can be formulated as query types in a visual-relational KG. We approach the problem of answering such queries with a novel combination of deep convolutional networks and models for learning knowledge graph embeddings. The resulting models can answer queries such as "How are these two unseen images related to each other?" We also explore a zero-shot learning scenario where an image of an entirely new entity is linked with multiple relations to entities of an existing KG. The multi-relational grounding of unseen entity images into a knowledge graph serves as the description of such an entity. We conduct experiments to demonstrate that the proposed deep architectures in combination with KG embedding objectives can answer the visual-relational queries efficiently and accurately.
翻译:视觉关系知识图( KG) 是一个多关系图, 其实体与图像相关。 我们引入了图像Graph, 一个 KG, 1,330 关系类型, 14, 870 个实体, 和 829, 931 图像。 视觉关系KG 导致新型的概率性查询类型, 将图像作为一流公民对待。 视觉关系图( KG) 和多关系图像检索的预测, 可以作为视觉关系图( KG) 的查询类型。 我们用深层革命网络和模型的新组合来回答这些问题, 以学习知识图嵌入。 由此产生的模型可以回答问题, 比如“ 这两张不可见的图像如何相互关联? ” 我们还探索了一种零光学情景, 即一个全新的实体的图像与现有 KG 实体的多个关系相联系。 将未知实体的多关系定位作为知识图的描述。 我们进行实验, 以证明与 KG 嵌入目标组合的深层结构能够高效和准确地回答视觉关系查询。