The unit commitment (UC) problem, which determines operating schedules of generation units to meet demand, is a fundamental task in power systems operation. Existing UC methods using mixed-integer programming are not well-suited to highly stochastic systems. Approaches which more rigorously account for uncertainty could yield large reductions in operating costs by reducing spinning reserve requirements; operating power stations at higher efficiencies; and integrating greater volumes of variable renewables. A promising approach to solving the UC problem is reinforcement learning (RL), a methodology for optimal decision-making which has been used to conquer long-standing grand challenges in artificial intelligence. This thesis explores the application of RL to the UC problem and addresses challenges including robustness under uncertainty; generalisability across multiple problem instances; and scaling to larger power systems than previously studied. To tackle these issues, we develop guided tree search, a novel methodology combining model-free RL and model-based planning. The UC problem is formalised as a Markov decision process and we develop an open-source environment based on real data from Great Britain's power system to train RL agents. In problems of up to 100 generators, guided tree search is shown to be competitive with deterministic UC methods, reducing operating costs by up to 1.4\%. An advantage of RL is that the framework can be easily extended to incorporate considerations important to power systems operators such as robustness to generator failure, wind curtailment or carbon prices. When generator outages are considered, guided tree search saves over 2\% in operating costs as compared with methods using conventional $N-x$ reserve criteria.
翻译:单位承诺(UC)问题决定了发电单位的运行时间表以满足需求,这是电力系统运作的一项根本任务。使用混合整数编程的现有UC方法不适宜于高度随机化系统。更严格地说明不确定性的方法可以通过减少旋转储备需求而大幅降低运营成本;运行电站效率更高;整合更多可变可再生能源。解决UC问题的一个有希望的方法是强化学习(RL),这是最佳决策的一种方法,用来克服人工智能中长期存在的重大挑战。这个理论探讨了RL对UC问题的应用,并解决了挑战,包括稳健的不确定性;跨多个问题案例的通用性;以及扩大电力系统规模。为了解决这些问题,我们开发了有指导的树木搜索方法,将无模式的RL和基于模型的规划结合起来。UC问题被正规化为Markov决策程序,我们根据来自英国电力系统的真实数据开发了开放源环境来培训RL代理。当人们认为RL对UC问题进行了应用时,相对于稳健的UC问题,在多个问题中比较稳妥性成本,在操作成本方面,在操作成本方面,以透明操作成本方面,以稳定为核心的操作法,在操作成本框架中可以确定重要的RBIL的操作成本框架之外的操作方法,在降低成本的操作成本框架。通过重要的常规成本框架。在降低中,通过重要的操作成本的操作成本的操作成本的简化中,可以被演示,在降低成本的操作成本框架中进行到重要研究。