Positive affect has been linked to increased interest, curiosity and satisfaction in human learning. In reinforcement learning, extrinsic rewards are often sparse and difficult to define, intrinsically motivated learning can help address these challenges. We argue that positive affect is an important intrinsic reward that effectively helps drive exploration that is useful in gathering experiences. We present a novel approach leveraging a task-independent reward function trained on spontaneous smile behavior that reflects the intrinsic reward of positive affect. To evaluate our approach we trained several downstream computer vision tasks on data collected with our policy and several baseline methods. We show that the policy based on our affective rewards successfully increases the duration of episodes, the area explored and reduces collisions. The impact is the increased speed of learning for several downstream computer vision tasks.
翻译:积极影响与人类学习的兴趣、好奇心和满意度的提高有关。在强化学习中,外部回报往往很少,难以界定,而且难以界定,本质上的主动性学习有助于应对这些挑战。我们认为,积极影响是一种重要的内在奖赏,有助于有效推动探索,有助于积累经验。我们提出了一个新颖的方法,利用一个独立任务的报酬功能,通过自发的微笑行为培训,反映积极影响的内在奖赏。为了评估我们的方法,我们培训了几个下游计算机关于根据我们的政策和若干基线方法收集的数据的愿景任务。我们表明,基于我们的影响性奖赏的政策成功地延长了事件、探索的领域和减少碰撞的时间。我们的影响是,一些下游计算机愿景任务的学习速度加快。