综合强化学习中通过勘探与开发对混合车辆进行最佳能源管理</s> (Optimal Energy Management of Plug-in Hybrid Vehicles Through Exploration-to-Exploitation Ratio Control in Ensemble Reinforcement Learning)

Developing intelligent energy management systems with high adaptability and superiority is necessary and significant for Hybrid Electric Vehicles (HEVs). This paper proposed an ensemble learning-based scheme based on a learning automata module (LAM) to enhance vehicle energy efficiency. Two parallel base learners following two exploration-to-exploitation ratios (E2E) methods are used to generate an optimal solution, and the final action is jointly determined by the LAM using three ensemble methods. 'Reciprocal function-based decay' (RBD) and 'Step-based decay' (SBD) are proposed respectively to generate E2E ratio trajectories based on conventional Exponential decay (EXD) functions of reinforcement learning. Furthermore, considering the different performances of three decay functions, an optimal combination with the RBD, SBD, and EXD is employed to determine the ultimate action. Experiments are carried out in software-in-loop (SiL) and hardware-in-the-loop (HiL) to validate the potential performance of energy-saving under four predefined cycles. The SiL test demonstrates that the ensemble learning system with an optimal combination can achieve 1.09$\%$ higher vehicle energy efficiency than a single Q-learning strategy with the EXD function. In the HiL test, the ensemble learning system with an optimal combination can save more than 1.04$\%$ in the predefined real-world driving condition than the single Q-learning scheme based on the EXD function.

翻译：对于混合电动车辆(HEV)来说,开发具有高度适应性和优越性的智能能源管理系统是必要和重要的。本文件提议了一个基于学习自动化模块(LAM)的全套学习方法,以提高车辆的能效。使用两种探索与开发比率(E2E)方法的两种平行基础学习者,利用两种探索与开发比率(E2E)的不同性能来产生最佳解决方案,最后行动由LAM使用三种混合方法共同确定。“基于功能的相互衰变”和“基于标准衰变”分别建议产生基于常规指数衰变(EXD)功能的E2E比率轨迹。此外,考虑到三种衰变函数的不同性能,与RBD、SBD和EXD的最佳组合用于确定最终行动。在软件中(SiL)和硬件中进行实验,以证实在四个预定义周期下节能的E2E比标准衰变率(SBD)的比率。SIL测试显示,在最优的驱动系统中,可实现一个比标准性能测试的、比标准的单一标准值测试。</s>