Hidden parameters are latent variables in reinforcement learning (RL) environments that are constant over the course of a trajectory. Understanding what, if any, hidden parameters affect a particular environment can aid both the development and appropriate usage of RL systems. We present an unsupervised method to map RL trajectories into a feature space where distance represents the relative difference in system behavior due to hidden parameters. Our approach disentangles the effects of hidden parameters by leveraging a recurrent neural network (RNN) world model as used in model-based RL. First, we alter the standard world model training algorithm to isolate the hidden parameter information in the world model memory. Then, we use a metric learning approach to map the RNN memory into a space with a distance metric approximating a bisimulation metric with respect to the hidden parameters. The resulting disentangled feature space can be used to meaningfully relate trajectories to each other and analyze the hidden parameter. We demonstrate our approach on four hidden parameters across three RL environments. Finally we present two methods to help identify and understand the effects of hidden parameters on systems.
翻译:隐藏参数是在轨迹中常态的强化学习环境( RL) 中的隐性变量。 了解哪些隐性参数影响特定环境, 可以帮助开发并适当使用 RL 系统。 我们展示了一种不受监督的方法, 将 RL 轨迹映射到一个特性空间, 其间距离代表了系统行为中因隐藏参数而产生的相对差异。 我们的方法通过在基于模型的 RL 中使用的经常性神经网络( RNN) 世界模型模型模型, 来分离隐藏参数信息。 首先, 我们改变标准的世界模型培训算法, 以分离世界模型记忆中的隐性参数信息。 然后, 我们使用一种衡量学习方法, 将 RNN 内存映射成一个空间, 与隐藏参数相近, 并用一个校准的参数测量空间。 由此产生的分解特性空间可以用来将轨迹与其它参数进行有意义的连接, 分析隐藏参数。 我们用四个隐性参数在基于模型的参数上展示了我们的方法。 最后, 我们用两种方法来帮助识别和理解隐藏参数对系统的影响 。