Taking advantage of their data-driven and model-free features, Deep Reinforcement Learning (DRL) algorithms have the potential to deal with the increasing level of uncertainty due to the introduction of renewable-based generation. To deal simultaneously with the energy systems' operational cost and technical constraints (e.g, generation-demand power balance) DRL algorithms must consider a trade-off when designing the reward function. This trade-off introduces extra hyperparameters that impact the DRL algorithms' performance and capability of providing feasible solutions. In this paper, a performance comparison of different DRL algorithms, including DDPG, TD3, SAC, and PPO, are presented. We aim to provide a fair comparison of these DRL algorithms for energy systems optimal scheduling problems. Results show DRL algorithms' capability of providing in real-time good-quality solutions, even in unseen operational scenarios, when compared with a mathematical programming model of the energy system optimal scheduling problem. Nevertheless, in the case of large peak consumption, these algorithms failed to provide feasible solutions, which can impede their practical implementation.
翻译:利用数据驱动的和不使用模型的功能,深加学习算法有可能应对由于采用可再生能源发电而日益增加的不确定性。为了同时处理能源系统的运作成本和技术限制(例如发电需求平衡),DRL算法在设计奖励功能时必须考虑权衡。这种权衡引入了超超参数,影响DRL算法的性能和提供可行解决办法的能力。本文介绍了不同DRL算法的性能比较,包括DDPG、TD3、SAC和PPO。我们的目的是为能源系统的最佳时间安排问题提供对这些DRL算法的公平比较。结果显示DRL算法在实时提供高质量解决办法的能力,即使是在看不见的操作情况下,如果与能源系统最佳时间安排问题的数学方案编制模型相比,也是如此。然而,在高峰消费的情况下,这些算法未能提供可行的解决办法,这可能会阻碍这些算法的实际实施。