Scene graph generation (SGG) has gained tremendous progress in recent years. However, its underlying long-tailed distribution of predicate classes is a challenging problem. For extremely unbalanced predicate distributions, existing approaches usually construct complicated context encoders to extract the intrinsic relevance of scene context to predicates and complex networks to improve the learning ability of network models for highly imbalanced predicate distributions. To address the unbiased SGG problem, we introduce a simple yet effective method dubbed Context-Aware Mixture-of-Experts (CAME) to improve model diversity and mitigate biased SGG without complicated design. Specifically, we propose to integrate the mixture of experts with a divide and ensemble strategy to remedy the severely long-tailed distribution of predicate classes, which is applicable to the majority of unbiased scene graph generators. The biased SGG is thereby reduced, and the model tends to anticipate more evenly distributed predicate predictions. To differentiate between various predicate distribution levels, experts with the same weights are not sufficiently diverse. In order to enable the network dynamically exploit the rich scene context and further boost the diversity of model, we simply use the built-in module to create a context encoder. The importance of each expert to scene context and each predicate to each expert is dynamically associated with expert weighting (EW) and predicate weighting (PW) strategy. We have conducted extensive experiments on three tasks using the Visual Genome dataset, showing that CAME outperforms recent methods and achieves state-of-the-art performance. Our code will be available publicly.
翻译:然而,对于极不平衡的上游分布而言,现有方法通常会构建复杂的背景代号,以便提取现场背景与上游和复杂网络的内在相关性,从而提高网络模型对于高度不平衡的上游分布的学习能力。为了解决不公正的SGG问题,我们引入了一个简单而有效的方法,即所谓的环境软件混合体(CAME),以改善模型多样性,减少有偏见的上游阶级分布,而没有复杂的设计。具体地说,我们提议将专家的混合体与差异和连锁战略结合起来,以纠正严重长期的上游阶级分布,这适用于大多数没有偏见的现场图形生成者。有偏见的SGGGG会因此减少,模型往往会更加均衡地预测上游分布的预测。为了区分各种上游分布层,具有相同份量的专家并不十分多样化。为了能够动态利用丰富的场景环境,并进一步提升模型的多样性,我们只是使用每个具有深度和深度的直观模型模型,我们使用每个具有深度的直观和深度的直观模型,我们使用每个直观的直观模型,我们每个直观的直观模型都使用每个直观和直观的直观模型。