Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we ask: (i) how much information should be given to the learned optimizers, so as to enable faster adaptation and generalization over a lifetime, (ii) what meta-optimizer functions are learned in this process, and (iii) whether meta-gradient methods provide a bigger advantage in highly non-stationary environments. To study the effect of information provided to the meta-optimizer, as in recent works (Flennerhag et al., 2021; Almeida et al., 2021), we replace the tuned meta-parameters of fixed update rules with learned meta-parameter functions of selected context features. The context features carry information about agent performance and changes in the environment and hence can inform learned meta-parameter schedules. We find that adding more contextual information is generally beneficial, leading to faster adaptation of meta-parameter values and increased performance over a lifetime. We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features. Lastly, we find that without context, meta-gradients do not provide a consistent advantage over the baseline in highly non-stationary environments. Our findings suggest that contextualizing meta-gradients can play a pivotal role in extracting high performance from meta-gradients in non-stationary settings.
翻译:在这项工作中,我们为非静止环境中的元梯梯度带来了新的清晰度。具体地说,我们问:(一) 向学习的优化者提供了多少信息,以便能够在一生中更快地适应和概括适应和普及,(二) 在这一过程中没有学习什么元优化功能;(二) 在这一过程中没有学习到哪些元优化功能,以及(三) 元升级方法是否为高度非常态环境提供了更大的优势。在这种环境中,尚未系统地研究此类环境中元梯度的特性。在这项工作中,我们为非静止环境中的元梯度带来新的清晰度。具体地,我们问:(一) 向学习的优化者提供多少信息,为在一生中更快的适应和普及提供更迅速的信息,以便更快的适应; (二) 在这一过程中没有学习到哪些元优化功能; (三) 超升级方法是否在高度非常态环境中提供更大的优势。