Many optimal control problems require the simultaneous output of continuous and discrete control variables. Such problems are usually formulated as mixed-integer optimal control (MIOC) problems, which are challenging to solve due to the complexity of the solution space. Numerical methods such as branch-and-bound are computationally expensive and unsuitable for real-time control. This paper proposes a novel continuous-discrete reinforcement learning (CDRL) algorithm, twin delayed deep deterministic actor-Q (TD3AQ), for MIOC problems. TD3AQ combines the advantages of both actor-critic and Q-learning methods, and can handle the continuous and discrete action spaces simultaneously. The proposed algorithm is evaluated on a hybrid electric vehicle (HEV) energy management problem, where real-time control of the continuous variable engine torque and discrete variable gear ratio is essential to maximize fuel economy while satisfying driving constraints. Simulation results on different drive cycles show that TD3AQ can achieve near-optimal solutions compared to dynamic programming (DP) and outperforms the state-of-the-art discrete RL algorithm Rainbow, which is adopted for MIOC by discretizing continuous actions into a finite set of discrete values.
翻译:许多最优控制问题需要同时输出连续和离散控制变量。此类问题通常被制定为混合整数最优控制(MIOC)问题,由于解决空间的复杂性而具有挑战性。分支定界等数值方法计算成本高昂,不适用于实时控制。本文提出了一种新颖的连续-离散强化学习(CDRL)算法twin delayed deep deterministic actor-Q(TD3AQ)用于MIOC问题。TD3AQ结合了演员-评论家和Q-learning方法的优点,并可以同时处理连续和离散行动空间。该算法在混合动力车辆(HEV)能量管理问题上进行评估,在该问题上,连续变量发动机扭矩和离散变量变速比的实时控制是最大化燃油经济性并满足驾驶条件的关键。在不同的驾驶周期的仿真结果表明,TD3AQ可以实现接近最优解,相对于动态规划(DP)TD3AQ优于Rainbow这种具有状态的最新离散RL算法。MIOC通过将连续动作离散化成一组有限的离散值的方法进行处理。