Scene Graph Generation (SGG) represents objects and their interactions with a graph structure. Recently, many works are devoted to solving the imbalanced problem in SGG. However, underestimating the head predicates in the whole training process, they wreck the features of head predicates that provide general features for tail ones. Besides, assigning excessive attention to the tail predicates leads to semantic deviation. Based on this, we propose a novel SGG framework, learning to generate scene graphs from Head to Tail (SGG-HT), containing Curriculum Re-weight Mechanism (CRM) and Semantic Context Module (SCM). CRM learns head/easy samples firstly for robust features of head predicates and then gradually focuses on tail/hard ones. SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations. Experiments show that SGG-HT significantly alleviates the biased problem and chieves state-of-the-art performances on Visual Genome.
翻译:光谱图生成(SGG) 代表对象及其与图形结构的相互作用。 最近,许多作品致力于解决 SGG 中的不平衡问题。 但是,在整个培训过程中低估了头部上游特征,从而摧毁了为尾部特征提供一般特征的头部上游特征。 此外,过分关注尾部上游导致语义偏差。在此基础上,我们提议了一个新的 SGG 框架,学习从头到尾(SGG-HT)生成场景图,包含课程重量机制(CRM)和语义背景模块(SCM),CRM首先学习头部上游的稳健特征,然后逐渐将注意力集中在尾部/硬上。建议SCM通过确保生成的场景图与全球和局部演示中的地面真相之间的语义一致性来缓解语义偏差。 实验显示SGG-HT 显著缓解了偏向问题,并在视觉基因组上展示了最新表现。