Inverse reinforcement learning (IRL) is a common technique for inferring human preferences from data. Standard IRL techniques tend to assume that the human demonstrator is stationary, that is that their policy $\pi$ doesn't change over time. In practice, humans interacting with a novel environment or performing well on a novel task will change their demonstrations as they learn more about the environment or task. We investigate the consequences of relaxing this assumption of stationarity, in particular by modelling the human as learning. Surprisingly, we find in some small examples that this can lead to better inference than if the human was stationary. That is, by observing a demonstrator who is themselves learning, a machine can infer more than by observing a demonstrator who is noisily rational. In addition, we find evidence that misspecification can lead to poor inference, suggesting that modelling human learning is important, especially when the human is facing an unfamiliar environment.
翻译:反强化学习(IRL)是从数据中推断人类偏好的一种常见方法。标准的IRL技术往往假设人类示范器是固定不变的,即其政策$\pi$不会随时间变化。在实践中,人类与新环境互动或在新任务上表现良好会随着他们更多地了解环境或任务而改变其演示。我们调查了放松这种关于常态的假设的后果,特别是通过模拟人类作为学习。令人惊讶的是,我们在一些小例子中发现,这可能导致比人类是固定的更好的推断。也就是说,通过观察一个自我学习的示范器,机器可以比观察一个无常理性的示范器来推断更多。此外,我们发现有证据表明,定点错误可能导致错误的推断,表明模拟人类学习很重要,特别是当人类面临陌生的环境时。