Reinforcement learning techniques can provide substantial insights into the desired behaviors of future autonomous driving systems. By optimizing for societal metrics of traffic such as increased throughput and reduced energy consumption, such methods can derive maneuvers that, if adopted by even a small portion of vehicles, may significantly improve the state of traffic for all vehicles involved. These methods, however, are hindered in practice by the difficulty of designing efficient and accurate models of traffic, as well as the challenges associated with optimizing for the behaviors of dozens of interacting agents. In response to these challenges, this paper tackles the problem of learning generalizable traffic control strategies in simple representations of vehicle driving dynamics. In particular, we look to mixed-autonomy ring roads as depictions of instabilities that result in the formation of congestion. Within this problem, we design a curriculum learning paradigm that exploits the natural extendability of the network to effectively learn behaviors that reduce congestion over long horizons. Next, we study the implications of modeling lane changing on the transferability of policies. Our findings suggest that introducing lane change behaviors that even approximately match trends in more complex systems can significantly improve the generalizability of subsequent learned models to more accurate multi-lane models of traffic.
翻译:强化学习技术可以提供对未来自主驾驶系统所需行为的深刻洞察力。通过优化社会交通指标,如增加吞吐量和减少能源消耗,这些方法可以产生一些操作,这些操作如果被即使是一小部分车辆所采用,也可能大大改善所有相关车辆的交通状况。然而,这些方法在实践中由于难以设计高效和准确的交通模式,也由于在优化数十个互动代理人的行为方面存在挑战而受阻。为应对这些挑战,本文件解决了在简单的车辆驾驶动态描述中学习通用交通控制战略的问题。特别是,我们把混合自主环路作为造成交通拥挤的不稳状态的描述。在此问题上,我们设计了一个课程学习模式,利用网络的自然可扩展性,有效地学习减少长距离交通拥堵的行为。接下来,我们研究模拟车道变化对政策可转移性的影响。我们的调查结果表明,引入可与更为复杂的系统趋势大致相匹配的车道变化行为,可以大大改善随后学到的交通模式的通用性,从而形成更准确的多路模式。