Reinforcement Learning (RL) can enable agents to learn complex tasks. However, it is difficult to interpret the knowledge and reuse it across tasks. Inductive biases can address such issues by explicitly providing generic yet useful decomposition that is otherwise difficult or expensive to learn implicitly. For example, object-centered approaches decompose a high dimensional observation into individual objects. Expanding on this, we utilize an inductive bias for explicit object-centered knowledge separation that provides further decomposition into semantic representations and dynamics knowledge. For this, we introduce a semantic module that predicts an objects' semantic state based on its context. The resulting affordance-like object state can then be used to enrich perceptual object representations. With a minimal setup and an environment that enables puzzle-like tasks, we demonstrate the feasibility and benefits of this approach. Specifically, we compare three different methods of integrating semantic representations into a model-based RL architecture. Our experiments show that the degree of explicitness in knowledge separation correlates with faster learning, better accuracy, better generalization, and better interpretability.
翻译:强化学习( RL) 能够让代理商学习复杂的任务。 但是, 很难解释知识, 并重新利用它。 引入偏差可以明确提供一般但有用的分解, 从而解决这类问题, 否则很难或隐含地学习。 例如, 以物体为中心的方法将高维的观测分解到单个物体中。 扩大这个方法, 我们使用一种感应偏差, 使以物体为中心的知识分离能够进一步分解成语义表达和动态知识。 为此, 我们引入了一个语义模块, 根据物体的上下文预测物体的语义状态。 由此产生的经济适用性对象状态可以用来丰富感知性物体的表达方式。 有了最小的设置和环境, 我们展示了这种方法的可行性和好处。 具体地说, 我们比较了将语义表达方式纳入以模型为基础的RL结构的三种不同方法。 我们的实验表明, 知识分立的清晰度程度与更快的学习、 更准确性、 更概括性、 和更好的解释性有关。