This paper considers the problem of reward design for autonomous driving (AD), with insights that are also applicable to the design of cost functions and performance metrics more generally. Herein we develop 8 simple sanity checks for identifying flaws in reward functions. The sanity checks are applied to reward functions from past work on reinforcement learning (RL) for autonomous driving, revealing near-universal flaws in reward design for AD that might also exist pervasively across reward design for other tasks. Lastly, we explore promising directions that may help future researchers design reward functions for AD.
翻译:本文探讨了自主驾驶奖赏设计问题,其洞察力也适用于成本功能和绩效衡量标准的设计。我们在此开发了8个简单的智能检查,以查明奖赏功能的缺陷。理智检查适用于以往自主驾驶强化学习(RL)工作中的奖赏功能,揭示了反倾销奖赏设计中几乎普遍存在的缺陷,而其他任务的奖赏设计也可能普遍存在。最后,我们探索了有希望的方向,以帮助未来的研究人员设计反倾销奖赏功能。