Scene-graph generation involves creating a structural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. However, existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue we propose DDS -- a decoupled dynamic scene-graph generation network -- that consists of two independent branches that can disentangle extracted features. The key innovation of the current paper is the decoupling of the features representing the relationships from those of the objects, which enables the detection of novel object-relationship combinations. The DDS model is evaluated on three datasets and outperforms previous methods by a significant margin, especially in detecting previously unseen triplets.
翻译:光谱生成涉及通过预测输入数据中的对象-对象-关系三重关系,来从结构上描述场景中物体之间的关系。 但是,现有方法显示,在预设数据集之外检测三重人的工作表现不佳,主要原因是它们依赖依赖依赖特征学习。为了解决这一问题,我们提议DDS -- -- 一个分解的动态场景生成网络 -- -- 由两个独立的分支组成,能够分离提取的特征。当前文件的关键创新是将代表对象-对象关系的特点与对象关系的特点脱钩,从而能够探测新的对象-关系组合。DDS模型在三个数据集上进行了评估,并大大超越了先前的方法,特别是在探测以前看不见的三重物方面。