We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner. Specifically, we consider the Maximum Causal Entropy (MCE) IRL learner model and provide a tight upper bound on the learner's performance degradation based on the $\ell_1$-distance between the transition dynamics of the expert and the learner. Leveraging insights from the Robust RL literature, we propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch. Finally, we empirically demonstrate the stable performance of our algorithm compared to the standard MCE IRL algorithm under transition dynamics mismatches in both finite and continuous MDP problems.
翻译:我们在专家与学习者过渡动态不匹配的情况下研究反向强化学习(IRL)问题。 具体地说, 我们考虑最大 Causal Entropy (MCE) IRL 学习者模型, 并且根据专家与学习者过渡动态之间的距离为1美元,对学习者的性能退化提供严格的上限。 我们从Robust RL 文献中汲取了深刻的见解, 我们建议了一种强大的 MCE IRL 算法, 这是一种有原则的方法来帮助解决这种不匹配。 最后, 我们从经验上展示了我们算法与标准 MCE IRL 算法相比在有限和持续 MDP 问题的过渡动态不匹配下的稳定性。