Batch reinforcement learning (BRL) is an emerging research area in the RL community. It learns exclusively from static datasets (i.e. replay buffers) without interaction with the environment. In the offline settings, existing replay experiences are used as prior knowledge for BRL models to find the optimal policy. Thus, generating replay buffers is crucial for BRL model benchmark. In our B2RL (Building Batch RL) dataset, we collected real-world data from our building management systems, as well as buffers generated by several behavioral policies in simulation environments. We believe it could help building experts on BRL research. To the best of our knowledge, we are the first to open-source building datasets for the purpose of BRL learning.
翻译:批量强化学习( BRL) 是RL 社区中一个新兴的研究领域。 它只从静态数据集( 即重放缓冲) 中学习, 而不与环境互动。 在离线设置中, 现有的重播经验被用作 BRL 模型的先前知识, 以找到最佳政策 。 因此, 生成重播缓冲对于 BRL 模型基准至关重要 。 在 B2RL ( 建设批量RL) 数据集中, 我们从我们的建筑管理系统中收集了真实世界的数据, 以及模拟环境中的若干行为政策生成的缓冲。 我们认为它可以帮助培养 BRL 研究专家。 根据我们的知识, 我们是第一个为 BRL 学习目的开源构建数据集的。