Although Reinforcement Learning (RL) has shown to be capable of producing impressive results, its use is limited by the impact of its hyperparameters on performance. This often makes it difficult to achieve good results in practice. Automated RL (AutoRL) addresses this difficulty, yet little is known about the dynamics of the hyperparameter landscapes that hyperparameter optimization (HPO) methods traverse in search of optimal configurations. In view of existing AutoRL approaches dynamically adjusting hyperparameter configurations, we propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. Addressing an important open question on the legitimacy of such dynamic AutoRL approaches, we provide thorough empirical evidence that the hyperparameter landscapes strongly vary over time across representative algorithms from RL literature (DQN and SAC) in different kinds of environments (Cartpole and Hopper). This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
翻译:尽管强化学习 (RL) 已经证明能够产生令人印象深刻的结果,但其使用受到超参数对性能的影响的限制。这通常使实践中难以取得良好的结果。自动 RL (AutoRL) 解决了这个困难,但很少有人了解超参数优化 (HPO) 方法在寻找最佳配置时所穿越的超参数景观的动态。考虑到现有的 AutoRL 方法动态调整超参数配置,我们提出了一种方法,不仅在训练的一个时间点,而且在多个时间点构建和分析这些超参数景观。针对一个重要的开放问题,关于这种动态 AutoRL 方法的合理性,我们提供了深入的经验性证据,表明不同种类的环境(Cartpole 和 Hopper)中来自 RL 文献的代表性算法(DQN 和 SAC)的超参数景观随时间强烈变化。这支持了超参数在训练期间应该动态调整的理论,并显示了通过景观分析可以获得有关 AutoRL 问题的更多见解的潜力。