What is the difference between goal-directed and habitual behavior? We propose a novel computational framework of decision making with Bayesian inference, in which everything is integrated as an entire neural network model. The model learns to predict environmental state transitions by self-exploration and generating motor actions by sampling stochastic internal states $z$. Habitual behavior, which is obtained from the prior distribution of $z$, is acquired by reinforcement learning. Goal-directed behavior is determined from the posterior distribution of $z$ by planning, using active inference, to minimize the free energy for goal observation. We demonstrate the effectiveness of the proposed framework by experiments in a sensorimotor navigation task with camera observations and continuous motor actions.
翻译:目标导向行为与习惯行为有什么区别?我们建议采用贝叶西亚推理法建立一个新的决策计算框架,将所有事物都作为整个神经网络模型纳入其中。模型学会通过自我探索预测环境状态的转变,通过抽样随机内部状态产生运动动作,以z美元为单位进行取样。从先前分配的z美元中获得的习惯行为是通过强化学习获得的。目标导向行为通过计划,利用积极的推理法,将目标观测所需的自由能量减少到最低程度,从后方分配z美元中确定。我们展示了在传感器导航任务中进行实验,通过摄像和连续的机动行动,所拟议的框架的有效性。