Driven by successes in deep learning, computer vision research has begun to move beyond object detection and image classification to more sophisticated tasks like image captioning or visual question answering. Motivating such endeavors is the desire for models to capture not only objects present in an image, but more fine-grained aspects of a scene such as relationships between objects and their attributes. Scene graphs provide a formal construct for capturing these aspects of an image. Despite this, there have been only a few recent efforts to generate scene graphs from imagery. Previous works limit themselves to settings where bounding box information is available at train time and do not attempt to generate scene graphs with attributes. In this paper we propose a method, based on recent advancements in Generative Adversarial Networks, to overcome these deficiencies. We take the approach of first generating small subgraphs, each describing a single statement about a scene from a specific region of the input image chosen using an attention mechanism. By doing so, our method is able to produce portions of the scene graphs with attribute information without the need for bounding box labels. Then, the complete scene graph is constructed from these subgraphs. We show that our model improves upon prior work in scene graph generation on state-of-the-art data sets and accepted metrics. Further, we demonstrate that our model is capable of handling a larger vocabulary size than prior work has attempted.
翻译:在深层学习的成功推动下,计算机视觉研究开始超越对象探测和图像分类,而转向更复杂的任务,如图像字幕或视觉问题解答。这种努力的动机是模型不仅希望捕捉图像中显示的对象,而且希望捕捉场景中更精细的方面,例如对象及其属性之间的关系。场景图形为捕捉图像中的这些方面提供了一个正式的构造。尽管如此,最近只做了几次努力从图像中生成场景图。以前的作品仅限于在列车时可得到约束框信息,而不试图生成带有属性的场景图的设置。在本文中,我们提议了一种方法,不仅基于图像中的近期进步,而且是为了克服这些缺陷。我们首先制作小子图,每个描述从特定区域选择的输入图像的单个场景。尽管如此,我们的方法能够生成部分图像图,而无需绑定框标签。然后,完整的场景图图图是根据这些子图谱中最新的进化图,我们之前的进化图表型式展示了我们的进化图。我们之前的进化图式模型展示了我们的进化图的进化图。我们之前的进化图的进化图图。我们改进了进化图的进制图的进化图。我们之前的进化图。我们之前的进化图的进化图式图式图式图。我们改进了进化图式图。我们之前的进化图式图的进制图图。我们是改进了进制图的进制图的进制的进制图。我们之前的进制的进制图。我们之前的进制式图。我们进制图图图。我们图图图图图。我们所图。我们所图。我们所图的进制图的进制的进制图的进制图的进制图的进制图的进制图的进制图的进图。我们所图的进图的进图。我们所图。我们所图。我们所图的进制图的进制图的进制图图。我们在前的进制图的进制图的进制图图图图。我们所图的进图的进图。我们在前的进制图。我们在前的进制图的进制图的进制图的进制图。我们在前的进制