Many potential applications of reinforcement learning (RL) require guarantees that the agent will perform well in the face of disturbances to the dynamics or reward function. In this paper, we prove theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function. While this capability of MaxEnt RL has been observed empirically in prior work, to the best of our knowledge our work provides the first rigorous proof and theoretical characterization of the MaxEnt RL robust set. While a number of prior robust RL algorithms have been designed to handle similar disturbances to the reward function or dynamics, these methods typically require adding additional moving parts and hyperparameters on top of a base RL algorithm. In contrast, our theoretical results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications. While this does not imply that MaxEnt RL is the best available robust RL method, MaxEnt RL does possess a striking simplicity and appealing formal guarantees.
翻译:强化学习的许多潜在应用( RL) 需要保证代理商在遇到动态或奖励功能的干扰时能够很好地发挥作用。 在本文中,我们从理论上证明,标准的最大增试RL在动态和奖励功能的某些干扰中是强大的。虽然在先前的工作中已经从经验上观察到了MaxEnt RL的这种能力,但我们的工作为我们的知识提供了最大程度的MaxEnt RL强力组合的首次严格证据和理论定性。虽然以前设计的一些强力RL算法是为了处理与奖励功能或动态类似的扰动,但这些方法通常需要在基准RL算法的顶部添加额外的移动部件和超参数。相反,我们的理论结果表明, MaxEnt RL本身对于某些扰动是强大的,而不需要做任何额外的修改。 虽然这并不意味着MaxEnt RL是现有最强的强力RL方法, 但MaxEnt RL确实拥有惊人的简单和吸引人的正式保证。