用于控制多模式混合车辆的维持费用控制的多用途多剂深强化学习 (Multi-agent Deep Reinforcement Learning for Charge-sustaining Control of Multi-mode Hybrid Vehicles)

Transportation electrification requires an increasing number of electric components (e.g., electric motors and electric energy storage systems) on vehicles, and control of the electric powertrains usually involves multiple inputs and multiple outputs (MIMO). This paper focused on the online optimization of energy management strategy for a multi-mode hybrid electric vehicle based on multi-agent reinforcement learning (MARL) algorithms that aim to address MIMO control optimization while most existing methods only deal with single output control. A new collaborative cyber-physical learning with multi-agents is proposed based on the analysis of the evolution of energy efficiency of the multi-mode hybrid electric vehicle (HEV) optimized by a deep deterministic policy gradient (DDPG)-based MARL algorithm. Then a learning driving cycle is set by a novel random method to speed up the training process. Eventually, network design, learning rate, and policy noise are incorporated in the sensibility analysis and the DDPG-based algorithm parameters are determined, and the learning performance with the different relationships of multi-agents is studied and demonstrates that the not completely independent relationship with Ratio 0.2 is the best. The compassion study with the single-agent and multi-agent suggests that the multi-agent can achieve approximately 4% improvement of total energy over the single-agent scheme. Therefore, the multi-objective control by MARL can achieve good optimization effects and application efficiency.

翻译：本文侧重于基于多试剂强化学习(MARL)算法的多式混合电动车辆能源管理战略的在线优化,该算法旨在处理MIMO控制优化问题,而大多数现有方法仅涉及单一产出控制。根据对多式混合电动车(HEV)能源效率演变的分析,提议与多式混合电动车(HEV)进行新的协作网络物理学习,该多式混合电动车(HEV)以基于深度确定性政策梯度(DPG)的MARL算法优化为优化。然后,通过新颖随机方法建立学习驱动循环,以加快培训进程。最终,网络设计、学习率和政策噪音被纳入了敏锐性分析,并确定了基于DDPG的算法参数,对多式试剂不同关系的学习表现进行了研究,并表明与0.2比率混合电动汽车(HEVER)的完全独立关系是最佳的。与单一试剂优化、多动能控制办法的同情力研究可以实现高性结果。