Eco-driving strategies have been shown to provide significant reductions in fuel consumption. This paper outlines an active driver assistance approach that uses a residual policy learning (RPL) agent trained to provide residual actions to default power train controllers while balancing fuel consumption against other driver-accommodation objectives. Using previous experiences, our RPL agent learns improved traction torque and gear shifting residual policies to adapt the operation of the powertrain to variations and uncertainties in the environment. For comparison, we consider a traditional reinforcement learning (RL) agent trained from scratch. Both agents employ the off-policy Maximum A Posteriori Policy Optimization algorithm with an actor-critic architecture. By implementing on a simulated commercial vehicle in various car-following scenarios, we find that the RPL agent quickly learns significantly improved policies compared to a baseline source policy but in some measures not as good as those eventually possible with the RL agent trained from scratch.
翻译:本文概述了一种积极的驱动者援助方法,这种方法使用受过训练的残余政策学习代理,向默认电动火车控制员提供剩余行动,同时平衡燃料消耗与其他驱动-住宿目标。利用以往的经验,我们的驱动器学会了更好的牵引力,并调整了剩余政策,使电压的操作适应环境的变化和不确定性。相比之下,我们认为,这是一种从零开始就受过训练的传统强化学习代理。两种代理都采用由行为者-批评结构组成的离政策最大后端政策优化算法。通过在各种汽车跟踪情景中采用模拟商用车辆,我们发现,与基线源政策相比,RPL代理迅速学习了显著改进的政策,但在某些措施中,与最终经过从零开始培训的RL代理商相比,情况并不尽如人意。