Inverse Reinforcement Learning (IRL) is the problem of finding a reward function which describes observed/known expert behavior. IRL is useful for automated control in situations where the reward function is difficult to specify manually, which impedes reinforcement learning. We provide a new IRL algorithm for the continuous state space setting with unknown transition dynamics by modeling the system using a basis of orthonormal functions. We provide a proof of correctness and formal guarantees on the sample and time complexity of our algorithm.
翻译:反强化学习(IRL)是找到一种能描述观察/已知专家行为的奖励功能的问题。 在奖励功能难以手动指定的情况下,IRL对自动控制非常有用,这妨碍了强化学习。我们为连续的状态空间设置提供了一种新的IRL算法,这种空间设置不为人知的过渡动态,通过以正态功能为基础对系统进行建模。我们提供了关于我们算法样本和时间复杂性的正确性和正式保证的证明。