Flexible, goal-directed behavior is a fundamental aspect of human life. Based on the free energy minimization principle, the theory of active inference formalizes the generation of such behavior from a computational neuroscience perspective. Based on the theory, we introduce an output-probabilistic, temporally predictive, modular artificial neural network architecture, which processes sensorimotor information, infers behavior-relevant aspects of its world, and invokes highly flexible, goal-directed behavior. We show that our architecture, which is trained end-to-end to minimize an approximation of free energy, develops latent states that can be interpreted as affordance maps. That is, the emerging latent states signal which actions lead to which effects dependent on the local context. In combination with active inference, we show that flexible, goal-directed behavior can be invoked, incorporating the emerging affordance maps. As a result, our simulated agent flexibly steers through continuous spaces, avoids collisions with obstacles, and prefers pathways that lead to the goal with high certainty. Additionally, we show that the learned agent is highly suitable for zero-shot generalization across environments: After training the agent in a handful of fixed environments with obstacles and other terrains affecting its behavior, it performs similarly well in procedurally generated environments containing different amounts of obstacles and terrains of various sizes at different locations.
翻译:灵活、 目标导向的行为是人类生活的一个基本方面。 基于自由能源最小化原则, 积极推导理论从计算神经科学角度将这种行为的生成正式化。 基于理论, 我们引入了输出概率、 时间预测、 模块化的人工神经网络架构, 处理感知信息, 推断出其世界与行为相关的方面, 并援引高度灵活、 目标导向的行为。 我们显示我们的架构, 其经过培训的端对端到端以最大限度地减少自由能源接近, 开发出可以被解释为价格可承受的地图的潜在状态。 也就是说, 新兴潜在状态信号显示哪些行动导致影响当地环境的影响。 结合积极的推论, 我们显示可以援引灵活、 目标导向的行为, 包括正在形成的支付能力地图。 结果, 我们模拟剂通过连续的空间灵活地调整方向, 避免与障碍相撞, 并选择通往目标的路径非常确定。 此外, 我们显示, 学习的代理器非常适合在零射式的地形上, 并且能够影响不同程度的磁性环境, 。