We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner. Specifically, we consider the Maximum Causal Entropy (MCE) IRL learner model and provide a tight upper bound on the learner's performance degradation based on the $\ell_1$-distance between the two transition dynamics of the expert and the learner. Then, by leveraging insights from the Robust RL literature, we propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch. Finally, we empirically demonstrate the stable performance of our algorithm compared to the standard MCE IRL algorithm under transition mismatches in finite MDP problems.
翻译:在专家与学习者之间的过渡动态不匹配的情况下,我们研究了反向强化学习(IRL)问题。 具体地说,我们考虑了最大 Causal Entropy (MCE) IRL 学习者模型,并根据该专家与学习者两个过渡动态之间的距离为1美元,对学习者的性能退化提供了严格的上限。然后,我们利用Robust RL 文献的洞察力,提出了强有力的 MCE IRL 算法,这是帮助解决这种不匹配的一个原则性方法。 最后,我们从经验上展示了我们算法与有限的 MDP 问题的过渡错配法下标准的 MCE IRL 算法相比的稳定性能。