Effectively operating electrical vehicle charging station (EVCS) is crucial for enabling the rapid transition of electrified transportation. To solve this problem using reinforcement learning (RL), the dimension of state/action spaces scales with the number of EVs and is thus very large and time-varying. This dimensionality issue affects the efficiency and convergence properties of generic RL algorithms. We develop aggregation schemes that are based on the emergency of EV charging, namely the laxity value. A least-laxity first (LLF) rule is adopted to consider only the total charging power of the EVCS which ensures the feasibility of individual EV schedules. In addition, we propose an equivalent state aggregation that can guarantee to attain the same optimal policy. Based on the proposed representation, policy gradient method is used to find the best parameters for the linear Gaussian policy . Numerical results have validated the performance improvement of the proposed representation approaches in attaining higher rewards and more effective policies as compared to existing approximation based approach.
翻译:有效运行的电动车辆充电站(EVCS)对于使电气化运输能够快速过渡至关重要。为了通过强化学习(RL)来解决这个问题,国家/行动空间尺度的尺寸与EV数量相适应,因此规模很大,时间差异很大。这一维度问题影响到通用RL算法的效率和趋同特性。我们根据EV充电的紧急情况,即宽松值,制定了综合计划。首先采用最不宽松规则,只考虑EVCS的总充电权,确保个人EV时间表的可行性。此外,我们提议了相应的州汇总,可以保证实现同样的最佳政策。根据拟议的代表性,政策梯度方法被用来为线性计政策找到最佳参数。数字结果证实了拟议的代表制方法在获得更高回报和更有效政策方面与现有的近似法相比的绩效改进。