Reinforcement learning (RL) applications, where an agent can simply learn optimal behaviors by interacting with the environment, are quickly gaining tremendous success in a wide variety of applications from controlling simple pendulums to complex data centers. However, setting the right hyperparameters can have a huge impact on the deployed solution performance and reliability in the inference models, produced via RL, used for decision-making. Hyperparameter search itself is a laborious process that requires many iterations and computationally expensive to find the best settings that produce the best neural network architectures. In comparison to other neural network architectures, deep RL has not witnessed much hyperparameter tuning, due to its algorithm complexity and simulation platforms needed. In this paper, we propose a distributed variable-length genetic algorithm framework to systematically tune hyperparameters for various RL applications, improving training time and robustness of the architecture, via evolution. We demonstrate the scalability of our approach on many RL problems (from simple gyms to complex applications) and compared with Bayesian approach. Our results show that with more generations, optimal solutions that require fewer training episodes and are computationally cheap while being more robust for deployment. Our results are imperative to advance deep reinforcement learning controllers for real-world problems.
翻译:强化学习( RL) 应用程序, 代理商可以通过与环境互动来简单学习最佳行为, 快速在从控制简单的钟式到复杂的数据中心等各种应用中取得巨大成功。 但是, 设置正确的超参数可对通过RL产生的用于决策的推论模型中部署的解决方案性能和可靠性产生巨大影响。 超光谱搜索本身是一个艰巨的过程, 需要许多迭代和计算成本昂贵才能找到产生最佳神经网络结构的最佳环境。 与其他神经网络结构相比, 深长的RL没有看到太多超光度调整, 这是因为其算法复杂性和模拟平台需要。 在本文中, 我们提出一个分布式的多长的遗传算法框架, 以便系统地调整各种RL应用程序的超光谱性功能和可靠性, 通过演化来改善培训时间和结构的稳健性。 我们展示了我们处理许多RL问题的方法( 从简单的健身房到复杂的应用)和与Bayesian 方法的可扩展性。 我们的结果显示, 与更新一代相比, 最优的解决方案需要更少的训练过程和更廉价的升级的升级, 而我们则需要更精确的计算的升级的学习。