Deep Reinforcement Learning (or just "RL") is gaining popularity for industrial and research applications. However, it still suffers from some key limits slowing down its widespread adoption. Its performance is sensitive to initial conditions and non-determinism. To unlock those challenges, we propose a procedure for building ensembles of RL agents to efficiently build better local decisions toward long-term cumulated rewards. For the first time, hundreds of experiments have been done to compare different ensemble constructions procedures in 2 electricity control environments. We discovered an ensemble of 4 agents improves accumulated rewards by 46%, improves reproducibility by a factor of 3.6, and can naturally and efficiently train and predict in parallel on GPUs and CPUs.
翻译:深入强化学习(或仅仅是“RL”)在工业和研究应用方面越来越受欢迎。然而,它仍然受到一些关键限制,延缓其广泛采用的速度。它的性能对初始条件和非确定性十分敏感。为了解决这些挑战,我们建议建立一个程序,以建立RL代理机构群,高效地为长期累积的奖励制定更好的地方决策。第一次进行了数百次实验,比较了2个电力控制环境中的不同组合建筑程序。我们发现4个代理机构共增加了46%的累积收益,提高了3.6倍的可复制性,并且可以自然和有效地同时培训和预测GPU和CPU。