The Dutch power market includes a day-ahead market and an auction-like intraday balancing market. The varying supply and demand of power and its uncertainty induces an imbalance, which causes differing power prices in these two markets and creates an opportunity for arbitrage. In this paper, we present collaborative dual-agent reinforcement learning (RL) for bi-level simulation and optimization of European power arbitrage trading. Moreover, we propose two novel practical implementations specifically addressing the electricity power market. Leveraging the concept of imitation learning, the RL agent's reward is reformed by taking into account prior domain knowledge results in better convergence during training and, moreover, improves and generalizes performance. In addition, tranching of orders improves the bidding success rate and significantly raises the P&L. We show that each method contributes significantly to the overall performance uplifting, and the integrated methodology achieves about three-fold improvement in cumulative P&L over the original agent, as well as outperforms the highest benchmark policy by around 50% while exhibits efficient computational performance.
翻译:荷兰电力市场包括一个日头市场和类似于拍卖的日间平衡市场。不同的电力供求及其不确定性导致不平衡,造成这两个市场不同的电力价格,并为套利创造机会。在本文中,我们介绍了双级模拟和优化欧洲电力套利交易的双试剂强化学习(RL)协作性学习(RL ) 。此外,我们提议了两个新的实际实施办法,具体针对电力市场。利用模仿学习的概念,对RL代理的奖励进行了改革,考虑到在培训期间在更趋同方面前域知识的成果,以及改进和概括性业绩。此外,裁剪裁订单提高了投标成功率,大大提高了P & L。 我们表明,每种方法都极大地促进了总体绩效的提高,综合方法在累积P & L相对于原代理的改进方面实现了三倍的改进,在展示高效计算业绩的同时,比最高基准政策高出约50%。