Flexible, goal-directed behavior is a fundamental aspect of human life. Based on the free energy minimization principle, the theory of active inference formalizes the generation of such behavior from a computational neuroscience perspective. Based on the theory, we introduce an output-probabilistic, temporally predictive, modular artificial neural network architecture, which processes sensorimotor information, infers behavior-relevant aspects of its world, and invokes highly flexible, goal-directed behavior. We show that our architecture, which is trained end-to-end to minimize an approximation of free energy, develops latent states that can be interpreted as affordance maps. That is, the emerging latent states signal which actions lead to which effects dependent on the local context. In combination with active inference, we show that flexible, goal-directed behavior can be invoked, incorporating the emerging affordance maps. As a result, our simulated agent flexibly steers through continuous spaces, avoids collisions with obstacles, and prefers pathways that lead to the goal with high certainty. Additionally, we show that the learned agent is highly suitable for zero-shot generalization across environments: After training the agent in a handful of fixed environments with obstacles and other terrains affecting its behavior, it performs similarly well in procedurally generated environments containing different amounts of obstacles and terrains of various sizes at different locations. To improve and focus model learning further, we plan to invoke active inference-based, information-gain-oriented behavior also while learning the temporally predictive model itself in the near future. Moreover, we intend to foster the development of both deeper event-predictive abstractions and compact, habitual behavioral primitives.
翻译:灵活、 目标导向的行为是人类生活的一个基本方面。 基于自由能源最小化原则, 主动推论理论从计算神经科学角度将这种行为的生成正式化。 基于理论, 我们引入了输出概率、 时间预测、 模块化的人工神经网络架构, 处理感官信息, 推断出其世界与行为相关的方面, 并援引高度灵活、 目标导向的行为。 我们显示我们的架构, 受过培训的端对端将自由能源的接近降到最低, 开发出可以被解释为支付能力的预测图的潜伏状态。 也就是说, 新兴潜伏状态信号显示哪些行动导致对当地环境的影响。 结合积极的推论, 我们显示可以引用灵活、 目标导向的行为, 包括正在形成的价格地图。 结果, 我们模拟剂通过连续的空间灵活地引导着方向, 避免与障碍碰撞, 并且更精确地选择通往目标的路径。 此外, 我们显示, 所学的代理人非常适合接近目标的直径直线, 也能够推动直径直线的直线的直线。 。 。 这意味着, 新兴的代理人非常适合接近于直径直径直径直径直的直径直径的直的直的直走向,, 和直径直径直的直的直的直的直的直的直的直的轨道的路径,, 。