Scene graph construction / visual relationship detection from an image aims to give a precise structural description of the objects (nodes) and their relationships (edges). The mutual promotion of object detection and relationship detection is important for enhancing their individual performance. In this work, we propose a new framework, called semantics guided graph relation neural network (SGRN), for effective visual relationship detection. First, to boost the object detection accuracy, we introduce a source-target class cognoscitive transformation that transforms the features of the co-occurent objects to the target object domain to refine the visual features. Similarly, source-target cognoscitive transformations are used to refine features of objects from features of relations, and vice versa. Second, to boost the relation detection accuracy, besides the visual features of the paired objects, we embed the class probability of the object and subject separately to provide high level semantic information. In addition, to reduce the search space of relationships, we design a semantics-aware relationship filter to exclude those object pairs that have no relation. We evaluate our approach on the Visual Genome dataset and it achieves the state-of-the-art performance for visual relationship detection. Additionally, Our approach also significantly improves the object detection performance (i.e. 4.2\% in mAP accuracy).
翻译:首先,为了提高天体探测的准确性,我们引入了源目标级的共生相识转换,将天体的特征转换到目标对象域,以完善其视觉特征。同样,源目标的共生相识变异性也用于完善来自关系特征的物体特征,反之亦然。第二,除了配对对象的视觉特征外,为了提高关系探测的准确性,我们插入了对象的等级概率,并单独提供高水平的线性信息。此外,为了减少关系的搜索空间,我们设计了一个源目标级的共生相向感知关系过滤器,以排除那些与目标对象无关的对象。我们评估了视觉基因组数据转换方法,并大幅改进了对等对象的精确性。(我们评估了视觉基因组数据检测方法,还大大改进了对等对象的性能探测方法)。