We consider the problem of demand-side energy management, where each household is equipped with a smart meter that is able to schedule home appliances online. The goal is to minimize the overall cost under a real-time pricing scheme. While previous works have introduced centralized approaches in which the scheduling algorithm has full observability, we propose the formulation of a smart grid environment as a Markov game. Each household is a decentralized agent with partial observability, which allows scalability and privacy-preservation in a realistic setting. The grid operator produces a price signal that varies with the energy demand. We propose an extension to a multi-agent, deep actor-critic algorithm to address partial observability and the perceived non-stationarity of the environment from the agent's viewpoint. This algorithm learns a centralized critic that coordinates training of decentralized agents. Our approach thus uses centralized learning but decentralized execution. Simulation results show that our online deep reinforcement learning method can reduce both the peak-to-average ratio of total energy consumed and the cost of electricity for all households based purely on instantaneous observations and a price signal.
翻译:我们考虑了需求方能源管理的问题,每个家庭都配备了智能计数器,能够在线安排家用电器。目标是在实时定价计划下最大限度地降低总成本。虽然先前的工程采用了集中化方法,使排期算法完全易于观察,但我们建议将智能电网环境设计成Markov游戏。每个家庭都是一个分散化的代理,部分易观察,允许在现实环境中进行可缩放和隐私保护。电网操作员产生了一个价格信号,与能源需求不同。我们建议扩大一个多试剂、深层的行为者-critic 算法,以解决局部可观察性和从代理的观点看认为环境不常态的问题。这种算法学习一个集中化的批评家,协调分散化剂的培训。因此,我们的方法使用集中化学习但分散的操作。模拟结果显示,我们的在线深度强化学习方法可以减少总能源消耗量的峰值和所有住户的电费,而仅基于即时观测和价格信号。