Learning from demonstrations then outperform the demonstrator is the advanced target of the inverse reinforcement learning (IRL), which is entitled as beyond-demonstrator (BD)-IRL. The BD-IRL provides an entirely new method to build expert systems, which gets rid of the dilemma of reward function design and reduces the computation costs. Currently, most of the BD-IRL algorithms are two-stage, it first infer a reward function then learn the policy via reinforcement learning (RL). Because of the two separate procedures, the two-stage algorithms have high computation complexity and low robustness. To overcome these flaw, we propose a BD-IRL framework entitled hybrid adversarial inverse reinforcement learning (HAIRL), which successfully integrates the reward learning and exploration into one procedure. The simulation results show that the HAIRL is more efficient and robust when compared with other similar state-of-the-art (SOTA) algorithms.
翻译:从演示中学习,然后超越演示,演示人就是反强化学习(IRL)的高级目标,后者有权作为超越演示的(BD)-IRL。BD-IRL为建立专家系统提供了全新的方法,从而摆脱了奖励功能设计的困境,并降低了计算成本。目前,大多数BD-IRL算法是两个阶段的,首先推论奖励功能,然后通过强化学习(RL)学习政策。由于两个不同的程序,两阶段算法具有较高的计算复杂性和低强度。为了克服这些缺陷,我们提议了一个称为混合对抗性反强化学习(HAIRL)的BD-IRL框架,该框架成功地将奖励学习和探索纳入一个程序。模拟结果表明,与其他类似的状态(SOTA)算法相比,HAIRL更有效率和有力。