Designing appropriate reward functions for Reinforcement Learning (RL) approaches has been a significant problem, especially for complex environments such as Atari games. Utilizing natural language instructions to provide intermediate rewards to RL agents in a process known as reward shaping can help the agent in reaching the goal state faster. In this work, we propose a natural language-based reward shaping approach that maps trajectories from the Montezuma's Revenge game environment to corresponding natural language instructions using an extension of the LanguagE-Action Reward Network (LEARN) framework. These trajectory-language mappings are further used to generate intermediate rewards which are integrated into reward functions that can be utilized to learn an optimal policy for any standard RL algorithms. For a set of 15 tasks from Atari's Montezuma's Revenge game, the Ext-LEARN approach leads to the successful completion of tasks more often on average than the reward shaping approach that uses the LEARN framework and performs even better than the reward shaping framework without natural language-based rewards.
翻译:设计加强学习(RL)方法的适当奖赏功能是一个重大问题,特别是在Atari游戏等复杂环境中。利用自然语言指示向RL代理提供中间奖赏,可以帮助代理更快地达到目标状态。在这项工作中,我们提出一种基于自然语言的奖赏制方法,将蒙特祖马游戏环境的轨迹映射成相应的自然语言指示,使用LanguagE-Action Reward网络(LEARN)框架的延伸。这些轨迹语言绘图还被用来产生中间奖赏,这些奖赏可以纳入奖励功能,用来学习任何标准的RL算法的最佳政策。对于Atari's Montezuma的REvenge游戏的15项任务来说,Ext-LEARN方法比使用LEARN框架的奖赏制方法平均地成功完成任务,甚至比没有自然语言奖赏的奖赏框架更好。