In this paper, we propose a novel framework dubbed peer learning to deal with the problem of biased scene graph generation (SGG). This framework uses predicate sampling and consensus voting (PSCV) to encourage different peers to learn from each other, improving model diversity and mitigating bias in SGG. To address the heavily long-tailed distribution of predicate classes, we propose to use predicate sampling to divide and conquer this issue. As a result, the model is less biased and makes more balanced predicate predictions. Specifically, one peer may not be sufficiently diverse to discriminate between different levels of predicate distributions. Therefore, we sample the data distribution based on frequency of predicates into sub-distributions, selecting head, body, and tail classes to combine and feed to different peers as complementary predicate knowledge during the training process. The complementary predicate knowledge of these peers is then ensembled utilizing a consensus voting strategy, which simulates a civilized voting process in our society that emphasizes the majority opinion and diminishes the minority opinion. This approach ensures that the learned representations of each peer are optimally adapted to the various data distributions. Extensive experiments on the Visual Genome dataset demonstrate that PSCV outperforms previous methods. We have established a new state-of-the-art (SOTA) on the SGCls task by achieving a mean of \textbf{31.6}.
翻译:在本文中,我们提议一个称为同侪学习的新框架,以处理有偏向的景象图生成问题。这个框架利用上游抽样和协商一致投票(PSCV),鼓励不同同侪相互学习,改进模型多样性,减轻SG的偏向。为了解决上游等级的极为长期的分布,我们提议使用上游抽样来分裂和征服这一问题。结果,模型的偏向性较小,并作出更平衡的上游预测。具体地说,一个同侪可能不够多样化,无法对不同水平的上游分布进行区分。因此,我们用上游抽样和协商一致投票(PSCV)来抽样数据分配频率的数据分配,选择头部、身体和尾部类别,作为培训过程中的补充性上游知识,将数据传播给不同的同侪。然后,这些同侪的补充性上游知识利用协商一致投票战略汇集起来,以模拟我们社会中强调多数意见和减少少数人意见的文明投票过程。这个方法确保每个同侪的学术表达方式最佳地适应各种数据分布。关于视觉基因组数据分布的广泛实验显示PSCV+SA在以前采用的方法。