Reinforcement learning (RL) is a powerful data-driven control method that has been largely explored in autonomous driving tasks. However, conventional RL approaches learn control policies through trial-and-error interactions with the environment and therefore may cause disastrous consequences such as collisions when testing in real traffic. Offline RL has recently emerged as a promising framework to learn effective policies from previously-collected, static datasets without the requirement of active interactions, making it especially appealing for autonomous driving applications. Despite promising, existing offline RL algorithms such as Batch-Constrained deep Q-learning (BCQ) generally lead to rather conservative policies with limited exploration efficiency. To address such issues, this paper presents an enhanced BCQ algorithm by employing a learnable parameter noise scheme in the perturbation model to increase the diversity of observed actions. In addition, a Lyapunov-based safety enhancement strategy is incorporated to constrain the explorable state space within a safe region. Experimental results in highway and parking traffic scenarios show that our approach outperforms the conventional RL method, as well as the state-of-the-art offline RL algorithms.
翻译:强化学习(RL)是一种强大的数据驱动控制方法,在自主驾驶任务中已对此进行了广泛探讨。然而,常规的RL方法通过试验和与环境的过度互动学习控制政策,并因此可能造成灾难性后果,如在实际交通测试时发生碰撞。离线RL最近成为一个很有希望的框架,从先前收集的静态数据集中学习有效政策,而不需要积极互动,这使得它特别吸引自主驾驶应用。尽管目前存在的离线RL算法,如Batch-Constraced 深层Q-学习(BCQ)通常导致相当保守的政策,但探索效率有限。为解决这些问题,本文展示了一种强化的BCQ算法,在扰动模型中采用可学习的参数噪音计划,以增加观察到的行动的多样性。此外,基于Lyapunov的安全增强战略被纳入了限制安全区域内可爆炸状态空间。高速公路和停车交通情景的实验结果显示,我们的方法超过了常规RL方法,以及州级离线的RL算法。