Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer that discourages behavior that has a big impact on the environment. Although initial results with impact regularizers seem promising in mitigating some types of side effects, important challenges remain. In this paper, we examine the main current challenges of impact regularizers and relate them to fundamental design decisions. We discuss in detail which challenges recent approaches address and which remain unsolved. Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.
翻译:为强化学习设计奖赏功能是困难的:除了具体说明哪些行为被奖励于某一任务之外,奖赏还必须阻止不理想的结果。错误的奖赏功能可能导致意想不到的负面副作用和总体不安全行为。为了解决这一问题,最近建议增加特定奖赏功能,其影响调控器将抑制对环境有重大影响的行为。虽然有影响的正规化者在减轻某些类型的副作用方面似乎有希望取得初步结果,但依然存在着重大挑战。在本文件中,我们审视了影响规范化者当前的主要挑战,并将这些挑战与基本的设计决定联系起来。我们详细讨论了哪些是最近的方法所应对的挑战,哪些仍未解决。最后,我们探索了有希望的方向,以克服在防止影响规范者产生负面副作用方面尚未解决的挑战。