Visual environments are structured, consisting of distinct objects or entities. These entities have properties -- both visible and latent -- that determine the manner in which they interact with one another. To partition images into entities, deep-learning researchers have proposed structural inductive biases such as slot-based architectures. To model interactions among entities, equivariant graph neural nets (GNNs) are used, but these are not particularly well suited to the task for two reasons. First, GNNs do not predispose interactions to be sparse, as relationships among independent entities are likely to be. Second, GNNs do not factorize knowledge about interactions in an entity-conditional manner. As an alternative, we take inspiration from cognitive science and resurrect a classic approach, production systems, which consist of a set of rule templates that are applied by binding placeholder variables in the rules to specific entities. Rules are scored on their match to entities, and the best fitting rules are applied to update entity properties. In a series of experiments, we demonstrate that this architecture achieves a flexible, dynamic flow of control and serves to factorize entity-specific and rule-based information. This disentangling of knowledge achieves robust future-state prediction in rich visual environments, outperforming state-of-the-art methods using GNNs, and allows for the extrapolation from simple (few object) environments to more complex environments.
翻译:视觉环境由不同的物体或实体组成,由不同的物体或实体组成。这些实体的属性 -- -- 可见和潜伏 -- -- 可以决定彼此互动的方式。为了将图像分成不同实体,深学习的研究人员提出了结构感化偏差,例如基于空档的建筑。为了模型实体之间的相互作用,使用了等异形图形神经网(GNNS),但这些并不特别适合任务,原因有二。首先,GNNS并不预示互动会稀疏,因为独立实体之间的关系很可能是。第二,GNNS并不以实体有条件的方式将相互作用的知识考虑在内。作为替代办法,我们从认知科学中汲取灵感,并重新采用经典的方法,即生产系统,其中包括一套规则中具有约束力的占位变量适用于特定实体的规则模板。这些规则与实体相对匹配,适用最合适的规则来更新实体的属性。在一系列实验中,我们证明这一结构实现了灵活、动态的控制流动,有助于将实体特定和基于规则的信息纳入因素。作为一种替代办法,我们从认知科学的典型的典型方法,从而实现更稳健美的未来环境。