Learning structured representations of visual scenes is currently a major bottleneck to bridging perception with reasoning. While there has been exciting progress with slot-based models, which learn to segment scenes into sets of objects, learning configurational properties of entire groups of objects is still under-explored. To address this problem, we introduce Constellation, a network that learns relational abstractions of static visual scenes, and generalises these abstractions over sensory particularities, thus offering a potential basis for abstract relational reasoning. We further show that this basis, along with language association, provides a means to imagine sensory content in new ways. This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
翻译:视觉场景的学习结构化表现目前是将感知与推理联系起来的一个主要瓶颈。虽然基于时间档的模式取得了令人振奋的进展,这些模型学会将场景分成一组物体,但学习整个一组物体的配置特性仍然未得到充分探讨。为了解决这一问题,我们引入星座,这是一个学习静态视觉场景关联抽象学的网络,这些抽象的图象与感官特征的概观,从而为抽象关系推理提供了潜在的基础。我们进一步表明,这一基础与语言联系一道,为以新的方式想象感官内容提供了一种手段。这项工作是明确描述视觉关系并将它们用于复杂的认知程序的第一步。