The recently emerging multi-mode plug-in hybrid electric vehicle (PHEV) technology is one of the pathways making contributions to decarbonization, and its energy management requires multiple-input and multiple-output (MIMO) control. At the present, the existing methods usually decouple the MIMO control into single-output (MISO) control and can only achieve its local optimal performance. To optimize the multi-mode vehicle globally, this paper studies a MIMO control method for energy management of the multi-mode PHEV based on multi-agent deep reinforcement learning (MADRL). By introducing a relevance ratio, a hand-shaking strategy is proposed to enable two learning agents to work collaboratively under the MADRL framework using the deep deterministic policy gradient (DDPG) algorithm. Unified settings for the DDPG agents are obtained through a sensitivity analysis of the influencing factors to the learning performance. The optimal working mode for the hand-shaking strategy is attained through a parametric study on the relevance ratio. The advantage of the proposed energy management method is demonstrated on a software-in-the-loop testing platform. The result of the study indiates that learning rate of the DDPG agents is the greatest factor in learning performance. Using the unified DDPG settings and a relevance ratio of 0.2, the proposed MADRL method can save up to 4% energy compared to the single-agent method.
翻译:最近兴起的多模式插电式混合动力汽车(PHEV)技术是减碳的一种途径,其能量管理需要多输入和多输出(MIMO)控制。目前,现有方法通常将MIMO控制解耦为单输出(MISO)控制,并且只能实现其局部最优性能。为了全局优化多模式汽车,本文研究了一种基于多智能体深度强化学习(MADRL)的多模式PHEV能量管理MIMO控制方法。通过引入相关系数,提出了一种握手策略,使两个学习智能体可以在MADRL框架下使用深度确定性策略梯度(DDPG)算法进行协同工作。通过对影响因素对学习性能的敏感性分析,获得了DDPG智能体的统一设置。通过相关比例的参数研究,获得了握手策略的最佳工作模式。在软件在环测试平台上证明了所提出的能量管理方法的优势。研究结果表明,DDPG智能体的学习率是影响学习性能最大的因素。使用统一DDPG设置和相关比例为0.2的MADRL方法,相比单智能体方法可以节省多达4%的能量。