Many instances of similar or almost-identical industrial machines or tools are often deployed at once, or in quick succession. For instance, a particular model of air compressor may be installed at hundreds of customers. Because these tools perform distinct but highly similar tasks, it is interesting to be able to quickly produce a high-quality controller for machine $N+1$ given the controllers already produced for machines $1..N$. This is even more important when the controllers are learned through Reinforcement Learning, as training takes time, energy and other resources. In this paper, we apply Policy Intersection, a Policy Shaping method, to help a Reinforcement Learning agent learn to solve a new variant of a compressors control problem faster, by transferring knowledge from several previously learned controllers. We show that our approach outperforms loading an old controller, and significantly improves performance in the long run.
翻译:许多类似或几乎完全相同的工业机器或工具往往同时或迅速连续部署,例如,在数百个客户中安装一种特定的空气压缩机模型。由于这些工具执行不同但非常相似的任务,因此,鉴于已经为机器生产的控制器$N+1美元,能够快速为机器生产高质量的控制器是有趣的。当控制器通过强化学习学习,培训需要时间、能源和其他资源时,这一点就更为重要。在本文中,我们采用了政策交叉法,即政策组合法,以帮助强化学习代理商通过从一些以前学过的控制器转让知识,更快地解决压缩机控制问题的新变体。我们表明,我们的方法比旧控制器装上一个旧控制器要好,并且长期大大改进性能。