The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings. Inspired by recent advances in reinforcement learning (RL), this paper aims at assessing the potential of RL in building climate control problems with occupant interaction. We apply a recent RL approach, called DDPG (deep deterministic policy gradient), for the continuous building control tasks and assess its performance with simulation studies in terms of its ability to handle (a) the partial state observability due to sensor limitations; (b) complex stochastic system with high-dimensional state-spaces, which are jointly continuous and discrete; (c) uncertainties due to ambient weather conditions, occupant's behavior, and comfort feelings. Especially, the partial observability and uncertainty due to the occupant interaction significantly complicate the control problem. Through simulation studies, the policy learned by DDPG demonstrates reasonable performance and computational tractability.
翻译:建筑部门消耗了世界上最大的能源,在能源消耗和建筑物舒适管理方面有着相当大的研究兴趣。在加强学习(RL)方面最近取得的进展的启发下,本文件旨在评估RL在与占用者互动建立气候控制问题方面的潜力。我们最近采用了称为DDPG(深度确定性政策梯度)的RL方法,用于持续建筑控制任务,并用模拟研究评估其处理能力方面的表现:(a) 由于传感器限制,部分国家可视性;(b) 具有高维度状态空间的复杂随机系统,这些系统是连续和分离的;(c) 环境气候条件、占用者行为和舒适感造成的不确定性。特别是,由于占用性互动造成的部分可视性和不确定性使控制问题严重复杂化。通过模拟研究,DDPG所学的政策显示了合理的性能和可计算性。