While reinforcement learning (RL) is gaining popularity in energy systems control, its real-world applications are limited due to the fact that the actions from learned policies may not satisfy functional requirements or be feasible for the underlying physical system. In this work, we propose PROjected Feasibility (PROF), a method to enforce convex operational constraints within neural policies. Specifically, we incorporate a differentiable projection layer within a neural network-based policy to enforce that all learned actions are feasible. We then update the policy end-to-end by propagating gradients through this differentiable projection layer, making the policy cognizant of the operational constraints. We demonstrate our method on two applications: energy-efficient building operation and inverter control. In the building operation setting, we show that PROF maintains thermal comfort requirements while improving energy efficiency by 4% over state-of-the-art methods. In the inverter control setting, PROF perfectly satisfies voltage constraints on the IEEE 37-bus feeder system, as it learns to curtail as little renewable energy as possible within its safety set.
翻译:强化学习(RL)在能源系统控制中越来越受欢迎,但其实际应用却有限,因为学习政策的行动可能无法满足功能要求,或对基本物理系统来说可能不可行。在这项工作中,我们提议Projected Veality(PROF),这是在神经政策中强制实施 convex操作限制的一种方法。具体地说,我们把一个不同的投影层纳入以神经网络为基础的政策,以强制实施所有学到的行动都是可行的。然后,我们通过这一不同的投影层传播梯度,使政策认识到操作上的制约因素,从而更新政策端对端到终端。我们在两种应用上展示了我们的方法:节能建筑操作和逆向控制。在建筑操作环境中,我们表明PROF保持热舒适要求,同时将能源效率提高4%,超过最新技术方法。在逆向控制设置中,PROF完全满足IEEE 37-Busfer系统的电压限制,因为它学会在其安全设置内尽可能减少可再生能源。