The key to high-level cognition is believed to be the ability to systematically manipulate and compose knowledge pieces. While token-like structured knowledge representations are naturally provided in text, it is elusive how to obtain them for unstructured modalities such as scene images. In this paper, we propose a neural mechanism called Neural Systematic Binder or SysBinder for constructing a novel structured representation called Block-Slot Representation. In Block-Slot Representation, object-centric representations known as slots are constructed by composing a set of independent factor representations called blocks, to facilitate systematic generalization. SysBinder obtains this structure in an unsupervised way by alternatingly applying two different binding principles: spatial binding for spatial modularity across the full scene and factor binding for factor modularity within an object. SysBinder is a simple, deterministic, and general-purpose layer that can be applied as a drop-in module in any arbitrary neural network and on any modality. In experiments, we find that SysBinder provides significantly better factor disentanglement within the slots than the conventional object-centric methods, including, for the first time, in visually complex scene images such as CLEVR-Tex. Furthermore, we demonstrate factor-level systematicity in controlled scene generation by decoding unseen factor combinations.
翻译:高层认知的关键被认为是系统操作和编造知识片段的能力。 虽然文本中自然提供象征性结构化知识的表达方式, 但却难以获得这些结构化知识, 例如场景图像等非结构化模式。 在本文中, 我们提出一个神经机制, 名为神经系统Binder 或 SysBinder, 用于构建一个叫为块- 线状代表的新型结构化代表。 在块- 线状代表中, 被称为空格的物体中心代表方式是通过组成一套称为块的独立的要素代表形式来构建的, 以便利系统化的概括化。 SysBinder 以不受监督的方式获得这一结构, 并交替地应用两种不同的约束性原则: 空间对全场空间模块的结合和对物体内要素模块化的结合。 SysBinder 是一个简单、 确定性和通用的层结构化结构化代表器, 可以在任何任意的神经网络和任何模式上应用作为滴入模块。 在实验中, 我们发现 SysBinder 提供比常规的物体中心中心中心式方法要好得多的分解因素,, 包括, 在视觉图像的生成中, 级的系统化、 的组合化、 。