Load serving entities with storage units reach sizes and performances that can significantly impact clearing prices in electricity markets. Nevertheless, price endogeneity is rarely considered in storage bidding strategies and modeling the electricity market is a challenging task. Meanwhile, model-free reinforcement learning such as the Actor-Critic are becoming increasingly popular for designing energy system controllers. Yet implementation frequently requires lengthy, data-intense, and unsafe trial-and-error training. To fill these gaps, we implement an online Supervised Actor-Critic (SAC) algorithm, supervised with a model-based controller -- Model Predictive Control (MPC). The energy storage agent is trained with this algorithm to optimally bid while learning and adjusting to its impact on the market clearing prices. We compare the supervised Actor-Critic algorithm with the MPC algorithm as a supervisor, finding that the former reaps higher profits via learning. Our contribution, thus, is an online and safe SAC algorithm that outperforms the current model-based state-of-the-art.
翻译:然而,在储存招标战略和电力市场建模中,很少考虑价格内分性,这是一项艰巨的任务。与此同时,在设计能源系统控制器时,如Actor-Critics等无模型强化学习越来越受欢迎。然而,执行常常需要冗长、数据密集和不安全的试验和操作培训。为了填补这些空白,我们实施了一个在线监督行为者-批评算法(SAC),由基于模型的控制器 -- -- 模型预测控制器(MPC)监督。能源储存代理器接受这种算法的培训,以最佳出价,同时学习和调整其对市场清算价格的影响。我们将受监督的Actor-Critic算法与作为监督员的MPC算法进行比较,发现前者通过学习获得更高的利润。因此,我们的贡献是在线和安全的SAC算法,它比目前基于模型的状态要好。