This paper introduces a paradigm shift by viewing the task of affect modeling as a reinforcement learning (RL) process. According to the proposed paradigm, RL agents learn a policy (i.e. affective interaction) by attempting to maximize a set of rewards (i.e. behavioral and affective patterns) via their experience with their environment (i.e. context). Our hypothesis is that RL is an effective paradigm for interweaving affect elicitation and manifestation with behavioral and affective demonstrations. Importantly, our second hypothesis-building on Damasio's somatic marker hypothesis-is that emotion can be the facilitator of decision-making. We test our hypotheses in a racing game by training Go-Blend agents to model human demonstrations of arousal and behavior; Go-Blend is a modified version of the Go-Explore algorithm which has recently showcased supreme performance in hard exploration tasks. We first vary the arousal-based reward function and observe agents that can effectively display a palette of affect and behavioral patterns according to the specified reward. Then we use arousal-based state selection mechanisms in order to bias the strategies that Go-Blend explores. Our findings suggest that Go-Blend not only is an efficient affect modeling paradigm but, more importantly, affect-driven RL improves exploration and yields higher performing agents, validating Damasio's hypothesis in the domain of games.
翻译:本文通过将影响模拟任务视为强化学习(RL)进程,引入了范式转变。 根据拟议的范式,RL代理商通过尝试通过环境经验(即背景)最大限度地增加一系列奖励(即行为和情感模式),从而学习了一种政策(即情感互动)。我们的假设是,RL是互动影响诱导和表现与行为和动人的演示的有效范例。重要的是,我们在Damasio的体格标记假设上的第二个假设是,情感可以成为决策的促进者。我们通过培训Go-Bleand代理商模拟人类振奋和行为的演示,在一场比赛中测试我们的假设(即情感互动);Go-Bled是Go-Explore算法的修改版本,最近它展示了硬性勘探任务的最高性表现。我们首先根据规定的奖赏模式改变以激励为基础的功能,并观察能够有效展示影响影响和行为模式模式的代理商。然后,我们用基于激励的州级选择机制在比赛中测试我们的域域圈游戏中测试我们的假说,为了偏向 Go-Bs 改进结果,因此只能影响Gol-Bral 的策略。