As a scene graph compactly summarizes the high-level content of an image in a structured and symbolic manner, the similarity between scene graphs of two images reflects the relevance of their contents. Based on this idea, we propose a novel approach for image-to-image retrieval using scene graph similarity measured by graph neural networks. In our approach, graph neural networks are trained to predict the proxy image relevance measure, computed from human-annotated captions using a pre-trained sentence similarity model. We collect and publish the dataset for image relevance measured by human annotators to evaluate retrieval algorithms. The collected dataset shows that our method agrees well with the human perception of image similarity than other competitive baselines.
翻译:作为场景图,以结构化和象征性的方式简要地总结图像的高含量,两种图像的场景图的相似性反映了其内容的相关性。基于这一想法,我们提出一种新颖的方法,利用图形神经网络测量的相近性来检索图像到图像的场景图。在我们的方法中,图形神经网络受过培训,以预测代理图像相关性度量,该度量用预先训练的类似句子模型用人文附加说明的字幕计算。我们收集并公布由人类教师测量的图像相关性数据集,以评价检索算法。所收集的数据集表明,我们的方法与人类对图像相似性的看法与其他竞争性基线相当。