The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e.g., woman-on/standing on/walking on-beach. As general SGG models tend to predict head predicates and re-balancing strategies prefer tail categories, none of them can appropriately handle hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating hard-to-distinguish objects, we propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. First, we introduce an Adaptive Predicate Lattice (PL-A) to figure out hard-to-distinguish predicates, which adaptively explores predicate correlations in keeping with model's dynamic learning pace. Practically, PL-A is initialized from SGG dataset, and gets refined by exploring model's predictions of current mini-batch. Utilizing PL-A, we propose an Adaptive Category Discriminating Loss (CDL-A) and an Adaptive Entity Discriminating Loss (EDL-A), which progressively regularize model's discriminating process with fine-grained supervision concerning model's dynamic learning status, ensuring balanced and efficient learning process. Extensive experimental results show that our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance. Moreover, experiments on Sentence-to-Graph Retrieval and Image Captioning tasks further demonstrate practicability of our method.
翻译:当前Scene Graphing(SGG) 模型的性能严重受到难以辨别的顶部(例如,女性在上/在上/上上/在上上上,在浅滩上,女性在上/一直上/行的顶部。一般 SGG模型往往预测头部的顶部和再平衡战略偏向尾部类别,而没有一个这样的模型能够恰当地处理难以辨别的顶部。在细微的图像分类(重点是区分难以辨别的天体)的启发下,我们建议采用适应的优等优等级优等级优等优等优等级优等优等级预科习(FGGPL-A),目的是区分SGGGG的硬性极分级(FGGL-A),我们引入适应性能调整的平级实验性能模型和升级的升级性能(SDRIGA),我们提议以不断升级的升级的GLA-LS-S-S-S-S-S-SDL