Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function. Adversarial Imitation Learning (AIL) sheds light on automatic policy acquisition without access to the reward signal from the environment. In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework. To induce expert policies from demonstrations, AEAIL utilizes the reconstruction error of an auto-encoder as a reward signal, which provides more information for optimizing policies than the prior discriminator-based ones. Subsequently, we use the derived objective functions to train the auto-encoder and the agent policy. Experiments show that our AEAIL performs superior compared to state-of-the-art methods in the MuJoCo environments. More importantly, AEAIL shows much better robustness when the expert demonstrations are noisy. Specifically, our method achieves $16.4\%$ and $47.2\%$ relative improvement overall compared to the best baseline FAIRL and PWIL on clean and noisy expert data, respectively. Video results, open-source code and dataset are available in https://sites.google.com/view/auto-encoding-imitation.
翻译:强化学习(RL)为决策提供了一个强有力的框架,但在实践中应用自动编码往往需要精心设计的奖赏功能。反模仿学习(AIL)为自动政策获取提供了清晰的信息,而没有从环境中获得奖赏信号。在这项工作中,我们建议自动编码模拟学习(AEAIL),这是一个强大且可扩缩的AIL框架。为了引导示范专家政策,AEAIL利用自动编码器的重建错误作为奖赏信号,为优化政策提供了比以前基于歧视者的政策更多的信息。随后,我们利用衍生目标功能培训自动编码器和代理政策。实验表明,我们的AEAIL在Mujoco环境中比最新技术方法表现得更好。更重要的是,AEAIL在专家演示时表现得更加稳健。具体地说,我们的方法比最佳基线FAIRL和PWIL在清洁和噪音专家数据上的总体改进了164. 。视频结果、开放源码和开放源码/可提供 httpstoliset.