In this paper, we investigate the scheduling issue of diesel generators (DGs) in an Internet of Things (IoT)-Driven isolated microgrid (MG) by deep reinforcement learning (DRL). The renewable energy is fully exploited under the uncertainty of renewable generation and load demand. The DRL agent learns an optimal policy from history renewable and load data of previous days, where the policy can generate real-time decisions based on observations of past renewable and load data of previous hours collected by connected sensors. The goal is to reduce operating cost on the premise of ensuring supply-demand balance. In specific, a novel finite-horizon partial observable Markov decision process (POMDP) model is conceived considering the spinning reserve. In order to overcome the challenge of discrete-continuous hybrid action space due to the binary DG switching decision and continuous energy dispatch (ED) decision, a DRL algorithm, namely the hybrid action finite-horizon RDPG (HAFH-RDPG), is proposed. HAFH-RDPG seamlessly integrates two classical DRL algorithms, i.e., deep Q-network (DQN) and recurrent deterministic policy gradient (RDPG), based on a finite-horizon dynamic programming (DP) framework. Extensive experiments are performed with real-world data in an IoT-driven MG to evaluate the capability of the proposed algorithm in handling the uncertainty due to inter-hour and inter-day power fluctuation and to compare its performance with those of the benchmark algorithms.
翻译:本文研究了孤立微电网的柴油发电机(DG)调度问题,并采用深度强化学习(DRL)来解决。在考虑可再生能源和负荷需求不确定性的情况下,充分利用可再生能源。DRL代理从历史可再生能源和负荷数据中学习最优策略,其中策略可以基于连接传感器收集的前几小时的历史可再生能源和负荷数据进行实时决策。其目的是在保证供需平衡的前提下降低运营成本。具体来说,提出了一种考虑旋转备用的新型有限时间隐式马尔可夫决策过程(POMDP)模型。为了克服由于二进制DG开关决策和连续能量调度(ED)决策而产生的离散-连续混合动作空间的挑战,提出了一种DRL算法,即基于有限时间动态规划(DP)框架的混合动作有限时间隐式重复确定性策略梯度(HAFH-RDPG)。 HAFH-RDPG在有限时间DP框架下无缝集成了两种经典DRL算法,即深度Q网络(DQN)和循环确定性策略梯度(RDPG)。使用IoT驱动的MG的实测数据进行了大量实验,以评估拟议算法处理因逐小时和逐日电力波动而导致的不确定性的能力,并将其性能与基准算法进行比较。