Energy markets can provide incentives for undesired behavior of market participants. Multi-agent Reinforcement learning (MARL) is a promising new approach to determine the expected behavior of energy market participants. However, reinforcement learning requires many interactions with the system to converge, and the power system environment often consists of extensive computations, e.g., optimal power flow (OPF) calculation for market clearing. To tackle this complexity, we provide a model of the energy market to a basic MARL algorithm, in form of a learned OPF approximation and explicit market rules. The learned OPF surrogate model makes an explicit solving of the OPF completely unnecessary. Our experiments demonstrate that the model additionally reduces training time by about one order of magnitude, but at the cost of a slightly worse approximation of the Nash equilibrium. Potential applications of our method are market design, more realistic modeling of market participants, and analysis of manipulative behavior.
翻译:能源市场可以为市场参与者的不理想行为提供激励。多试剂强化学习(MARL)是确定能源市场参与者预期行为的一种有希望的新办法。然而,强化学习需要与系统进行许多互动,而电力系统环境往往包含广泛的计算,例如,最佳电力流动(OPF)的计算,用于市场清理。为解决这一复杂问题,我们为基本MARL算法提供了一个能源市场模型,其形式是学习的 OPF 近似和明确的市场规则。所学的 OPF 代用模型使得明确解决 OPF 完全没有必要。我们的实验表明,该模型将培训时间进一步缩短了大约一个数量级,但代价是纳什平衡略差一点的接近。我们方法的潜在应用是市场设计、市场参与者更现实的建模以及对操控行为的分析。</s>