Although Reinforcement Learning (RL) has shown to be capable of producing impressive results, its use is limited by the impact of its hyperparameters on performance. This often makes it difficult to achieve good results in practice. Automated RL (AutoRL) addresses this difficulty, yet little is known about the dynamics of the hyperparameter landscapes that hyperparameter optimization (HPO) methods traverse in search of optimal configurations. In view of existing AutoRL approaches dynamically adjusting hyperparameter configurations, we propose an approach to build and analyze these hyperparameter landscapes not just for one point in time but at multiple points in time throughout training. Addressing an important open question on the legitimacy of such dynamic AutoRL approaches, we provide thorough empirical evidence that the hyperparameter landscapes strongly vary over time across representative algorithms from RL literature (DQN and SAC) in different kinds of environments (Cartpole and Hopper). This supports the theory that hyperparameters should be dynamically adjusted during training and shows the potential for more insights on AutoRL problems that can be gained through landscape analyses.
翻译:虽然强化学习(RL)已经显示出能够产生令人印象深刻的结果,但其使用受其超参数对性能的影响的限制。这经常使实践中难以实现良好的结果。自动RL(AutoRL)解决了这个难点,但对于超参数优化(HPO)方法穿越寻找最佳配置的超参数地形动力学知道的甚少。鉴于现有的AutoRL方法动态调整超参数配置,我们提出一种方法来构建和分析这些超参数地形,不仅仅是在一点时间,而是在整个训练期间的多个时间点。针对一个重要的开放性问题,即动态AutoRL方法的合法性问题,我们提供了充分的经验证据,表明超参数地形在不同的环境(Cartpole和Hopper)中,跨代表性的RL文献算法(DQN和SAC)的时间内强烈地变化。这支持了超参数在训练期间应该动态调整的理论,并显示了通过地形分析可以获得关于AutoRL问题的更多见解的潜力。