Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images. Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding. However, due to the data distribution problems including long-tail distribution and semantic ambiguity, the predictions of current SGG models tend to collapse to several frequent but uninformative predicates (e.g., on, at), which limits practical application of these models in downstream tasks. To deal with the problems above, we propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a plug-and-play fashion and expanded to large SGG with 1,807 predicate classes. Our IETrans tries to relieve the data distribution problem by automatically creating an enhanced dataset that provides more sufficient and coherent annotations for all predicates. By training on the enhanced dataset, a Neural Motif model doubles the macro performance while maintaining competitive micro performance. The code and data are publicly available at https://github.com/waxnkw/IETrans-SGG.pytorch.
翻译:在图像中提取(主题、上游、天体)三重图像(主题、上游、天体)图生成(SGG)是为了在图像中提取三重图像。最近的工作在SGG上取得了稳步的进展,为高层次的视觉和语言理解提供了有用的工具。然而,由于数据分布问题,包括长尾分布和语义模糊不清,对当前SGG模型的预测往往会破灭为几种经常但无说服力的假设(例如,在这种假设中,在下游任务中限制这些模型的实际应用。为了处理上述问题,我们提出了一种新的内部和外部数据传输(IETrans)方法(IETrans),该方法可以以插播方式应用,并扩大到具有1 807个上游等级的大型SGGG。我们的IETrans试图通过自动创建能为所有上游提供更充分、更连贯说明的强化数据集来缓解数据分发问题。通过强化数据集的培训,Neuralmotif模型在保持竞争性微性表现的同时将宏观性表现翻倍。代码和数据公布于http://giths/waxnkw/IEtrans-SG.pittorch.pyrchrchrchrch。