Modern model-free reinforcement learning methods have recently demonstrated impressive results on a number of problems. However, complex domains like dexterous manipulation remain a challenge due to the high sample complexity. To address this, current approaches employ expert demonstrations in the form of state-action pairs, which are difficult to obtain for real-world settings such as learning from videos. In this paper, we move toward a more realistic setting and explore state-only imitation learning. To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations. The inverse dynamics model and the policy are trained jointly. Our method performs on par with state-action approaches and considerably outperforms RL alone. By not relying on expert actions, we are able to learn from demonstrations with different dynamics, morphologies, and objects. Videos available at https://people.eecs.berkeley.edu/~ilija/soil .
翻译:现代无模式强化学习方法最近在若干问题上显示出了令人印象深刻的结果。 但是,由于抽样复杂程度高,像极速操纵这样的复杂领域仍是一个挑战。 为了解决这个问题,目前的方法采用州-州行动对等形式的专家演示,而对于真实世界环境来说,如从视频中学习,很难获得这种演示。在本文中,我们向更现实的环境发展,探索仅以国家为主的仿真学习。为了应对这一环境,我们训练了一个反向动态模型,并用它来预测只为国家进行的演示采取的行动。反向动态模型和政策是联合培训的。我们的方法与州-行动方法相同,大大优于RL。通过不依赖专家行动,我们能够从不同动态、形态和目标的演示中学习。视频可在 https://people.eecs.berkeley.edu/~ilija/soil查阅。