Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costly and will be limited. In this work, we examine how offline training can be leveraged to minimize data costs (accelerate convergence) and program implementation costs. We present two approaches to doing so: pretraining our model to warm start the experiment with simulated tasks, and using a planning model trained to simulate the real world's rewards to the agent. We present results that demonstrate the utility of offline reinforcement learning to efficient price-setting in the energy demand response problem.
翻译:我们的团队提议在办公大楼里进行一次全面的能源需求反应实验。 尽管这是一项令人振奋的努力,将为社区提供价值,但为强化学习机构收集培训数据成本高昂且有限。 在这项工作中,我们研究如何利用离线培训来尽量减少数据成本(加速趋同)和方案执行成本。 我们提出两种方法来这样做:先训练我们的模型,以模拟任务为实验的开始提供温暖,然后使用经过训练的规划模型来模拟真实世界的回报。我们展示的结果表明,离线强化学习对于有效解决能源需求应对问题的价格调整是有用的。