We explore the use of a knowledge graphs, that capture general or commonsense knowledge, to augment the information extracted from images by the state-of-the-art methods for image captioning. The results of our experiments, on several benchmark data sets such as MS COCO, as measured by CIDEr-D, a performance metric for image captioning, show that the variants of the state-of-the-art methods for image captioning that make use of the information extracted from knowledge graphs can substantially outperform those that rely solely on the information extracted from images.
翻译:我们探索使用一种知识图,以收集一般或普通知识,来增加通过最先进的图像说明方法从图像中提取的信息。 我们的实验结果,在几套基准数据集上,如由CIDER-D(图像说明的性能衡量标准)测量的MS COCO(MS CO),显示利用从知识图中提取的信息的最先进的图像说明方法的变种可以大大超过那些完全依赖从图像中提取的信息的变种。