Making decisions in complex driving environments is a challenging task for autonomous agents. Imitation learning methods have great potentials for achieving such a goal. Adversarial Inverse Reinforcement Learning (AIRL) is one of the state-of-art imitation learning methods that can learn both a behavioral policy and a reward function simultaneously, yet it is only demonstrated in simple and static environments where no interactions are introduced. In this paper, we improve and stabilize AIRL's performance by augmenting it with semantic rewards in the learning framework. Additionally, we adapt the augmented AIRL to a more practical and challenging decision-making task in a highly interactive environment in autonomous driving. The proposed method is compared with four baselines and evaluated by four performance metrics. Simulation results show that the augmented AIRL outperforms all the baseline methods, and its performance is comparable with that of the experts on all of the four metrics.
翻译:在复杂的驱动环境中决策对于自主推动者来说是一项艰巨的任务。 模拟学习方法对于实现这一目标具有巨大的潜力。 反反强化学习(AIRL)是能够同时学习行为政策和奖赏功能的最先进的模仿学习方法之一,但只有在没有引入互动的简单和静态环境中才能证明这一点。 在本文件中,我们通过学习框架中的语义奖赏来提高和稳定AIRL的绩效。此外,我们调整扩大的AIRL, 使其适应在高度互动的自主驱动环境中更实际和更具挑战性的决策任务。拟议方法与四个基线进行比较,并用四个业绩衡量标准进行评估。模拟结果表明,增强的AIRL比所有基线方法都好,其绩效与所有四个衡量标准的专家相似。