Although Reinforcement Learning (RL) has shown impressive results in games and simulation, real-world application of RL suffers from its instability under changing environment conditions and hyperparameters. We give a first impression of the extent of this instability by showing that the hyperparameters found by automatic hyperparameter optimization (HPO) methods are not only dependent on the problem at hand, but even on how well the state describes the environment dynamics. Specifically, we show that agents in contextual RL require different hyperparameters if they are shown how environmental factors change. In addition, finding adequate hyperparameter configurations is not equally easy for both settings, further highlighting the need for research into how hyperparameters influence learning and generalization in RL.
翻译:虽然强化学习(RL)在游戏和模拟中显示出令人印象深刻的结果,但实际应用RL在环境条件和超参数变化的情况下,受到不稳定的影响,我们首先对这种不稳定程度的印象是,通过自动超参数优化方法发现超参数不仅取决于手头的问题,而且甚至取决于状态如何描述环境动态。具体地说,我们表明,如果环境因素的变化能够显示,背景RL的剂需要不同的超参数。 此外,找到适当的超参数配置对两种环境都不容易,进一步突出了研究超参数如何影响RL的学习和概括化的必要性。