There is a surge of interest in image scene graph generation (object, attribute and relationship detection) due to the need of building fine-grained image understanding models that go beyond object detection. Due to the lack of a good benchmark, the reported results of different scene graph generation models are not directly comparable, impeding the research progress. We have developed a much-needed scene graph generation benchmark based on the maskrcnn-benchmark and several popular models. This paper presents main features of our benchmark and a comprehensive ablation study of scene graph generation models using the Visual Genome and OpenImages Visual relationship detection datasets. Our codebase is made publicly available at https://github.com/microsoft/scene_graph_benchmark.
翻译:由于需要建立超越天体探测的精细图像理解模型,对图像场景图生成(目标、属性和关系探测)的兴趣激增。由于缺乏良好的基准,不同场景图生成模型的报告结果无法直接比较,阻碍了研究进展。我们根据Mowardrcnn-benchmark和几个流行模型制定了急需的场景图生成基准。本文介绍了我们基准的主要特征,以及利用视觉基因组和OpenIgages视觉关系探测数据集对场景图生成模型进行的全面对比研究。我们的代码库可在https://github.com/microcrosoft/scene_graph_benchmark上公开查阅。