A visual relationship denotes a relationship between two objects in an image, which can be represented as a triplet of (subject; predicate; object). Visual relationship detection is crucial for scene understanding in images. Existing visual relationship detection datasets only contain true relationships that correctly describe the content in an image. However, distinguishing false visual relationships from true ones is also crucial for image understanding and grounded natural language processing. In this paper, we construct a visual relationship authenticity dataset, where both true and false relationships among all objects appeared in the captions in the Flickr30k entities image caption dataset are annotated. The dataset is available at https://github.com/codecreator2053/VR_ClassifiedDataset. We hope that this dataset can promote the study on both vision and language understanding.
翻译:视觉关系是指图像中两个对象之间的关系,可以作为三重(主题;前提;对象)表示。视觉关系探测对于图像中的现场理解至关重要。现有的视觉关系探测数据集只包含正确描述图像内容的真实关系。然而,将假视觉关系与真实关系区分开来对于图像理解和有根自然语言处理也至关重要。在本文中,我们构建了一个视觉关系真实性数据集,在Flickr30k实体图像说明数据集的字幕中,所有对象之间的真实和虚假关系都有附加说明。数据集可在https://github.com/codecreator2053/VR_ClassifedDataset上查阅。我们希望这一数据集能够促进关于视觉和语言理解的研究。