A platoon refers to a group of vehicles traveling together in very close proximity using automated driving technology. Owing to its immense capacity to improve fuel efficiency, driving safety, and driver comfort, platooning technology has garnered substantial attention from the autonomous vehicle research community. Although highly advantageous, recent research has uncovered that an excessively small intra-platoon gap can impede traffic flow during highway on-ramp merging. While existing control-based methods allow for adaptation of the intra-platoon gap to improve traffic flow, making an optimal control decision under the complex dynamics of traffic conditions remains a challenge due to the massive computational complexity. In this paper, we present the design, implementation, and evaluation of a novel reinforcement learning framework that adaptively adjusts the intra-platoon gap of an individual platoon member to maximize traffic flow in response to dynamically changing, complex traffic conditions for highway on-ramp merging. The framework's state space has been meticulously designed in consultation with the transportation literature to take into account critical traffic parameters that bear direct relevance to merging efficiency. An intra-platoon gap decision making method based on the deep deterministic policy gradient algorithm is created to incorporate the continuous action space to ensure precise and continuous adaptation of the intra-platoon gap. An extensive simulation study demonstrates the effectiveness of the reinforcement learning-based approach for significantly improving traffic flow in various highway on-ramp merging scenarios.
翻译:一个排是指一组使用自动化驾驶技术在非常近距离就坐的车辆。由于排技术具有提高燃料效率、驾驶安全和驾驶舒适度的巨大能力,排技术得到了自主车辆研究界的大量关注。虽然它非常有利,但最近的研究发现,由于地块内部的交通缺口过小,可能会妨碍公路上轮机合并期间的交通流量。虽然现有的基于控制的方法允许调整地块内部的交通缺口,以改善交通流量,但在交通条件的复杂动态下作出最佳控制决定仍然是一项挑战。由于计算复杂程度巨大,因此,在本文中,我们介绍设计、实施和评价一个新的强化学习框架,以适应性地调整一个排成员在地段内部的交通缺口,以便根据动态变化和复杂的交通状况合并,最大限度地增加交通流量。尽管现有的基于控制性政策梯度计算法,在设计、实施和评价时仍会有一个新的强化学习框架,以便适应动态变化和复杂的交通状况。框架的状态空间经过仔细设计,以考虑到与合并效率直接相关的关键交通参数。基于深层的计算复杂程度,根据深层确定性政策梯变的算法,在设计方法的基础上,设计、实施和评价一个新的强化性强化方法,以将一个新的强化一个排内空间纳入不断的升级的模拟,以便不断改进的校内空间。</s>