Scene Graph Generation (SGG) aims to build a structured representation of a scene using objects and pairwise relationships, which benefits downstream tasks. However, current SGG methods usually suffer from sub-optimal scene graph generation because of the long-tailed distribution of training data. To address this problem, we propose Resistance Training using Prior Bias (RTPB) for the scene graph generation. Specifically, RTPB uses a distributed-based prior bias to improve models' detecting ability on less frequent relationships during training, thus improving the model generalizability on tail categories. In addition, to further explore the contextual information of objects and relationships, we design a contextual encoding backbone network, termed as Dual Transformer (DTrans). We perform extensive experiments on a very popular benchmark, VG150, to demonstrate the effectiveness of our method for the unbiased scene graph generation. In specific, our RTPB achieves an improvement of over 10% under the mean recall when applied to current SGG methods. Furthermore, DTrans with RTPB outperforms nearly all state-of-the-art methods with a large margin.
翻译:场景图生成(SGG)的目的是利用对象和对称关系来构建一个结构化的场景代表,这有利于下游任务;然而,目前的SGG方法通常会因为长期分发培训数据而出现亚最佳场景图形生成;为解决这一问题,我们提议在场景图生成时使用前比亚斯(RTPB)进行抵抗训练。具体地说,RTPB使用基于分布的先前偏差来提高模型在培训中较不频繁的关系上的检测能力,从而改进尾巴类别中的模型通用性。此外,为了进一步探索对象和关系的背景信息,我们设计了一个称为“双变换器(DTrans)”的背景编码主干网。我们用一个非常受欢迎的基准(VG150)进行广泛的实验,以展示我们用于不偏向场景图生成的方法的有效性。具体地说,我们的RTPB在应用当前SGG方法时在平均值下取得了超过10%的改进。此外,带有RTPB的DTRV几乎超越了所有有较大幅度的状态方法。