Many applications require an understanding of an image that goes beyond the simple detection and classification of its objects. In particular, a great deal of semantic information is carried in the relationships between objects. We have previously shown that the combination of a visual model and a statistical semantic prior model can improve on the task of mapping images to their associated scene description. In this paper, we review the model and compare it to a novel conditional multi-way model for visual relationship detection, which does not include an explicitly trained visual prior model. We also discuss potential relationships between the proposed methods and memory models of the human brain.
翻译:许多应用程序都要求了解超出简单探测和分类其对象的图像。 特别是, 大量语义信息在物体之间的关系中传播。 我们以前曾表明, 视觉模型和先前统计语义模型的结合可以改进图像绘图任务, 使其与相关场景描述相联。 在本文中, 我们审查该模型, 并将其与新颖的、 有条件的视觉关系探测多路模型进行比较, 该模型不包括一个经过明确训练的视觉先导模型。 我们还讨论了人类大脑拟议方法和记忆模型之间的潜在关系 。