The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks. We identify a mechanism by which non-stationary prediction targets can prevent learning progress in deep RL agents: \textit{capacity loss}, whereby networks trained on a sequence of target values lose their ability to quickly update their predictions over time. We demonstrate that capacity loss occurs in a range of RL agents and environments, and is particularly damaging to performance in sparse-reward tasks. We then present a simple regularizer, Initial Feature Regularization (InFeR), that mitigates this phenomenon by regressing a subspace of features towards its value at initialization, leading to significant performance improvements in sparse-reward environments such as Montezuma's Revenge. We conclude that preventing capacity loss is crucial to enable agents to maximally benefit from the learning signals they obtain throughout the entire training trajectory.
翻译:强化学习(RL)问题充满了非常态性的根源,使其成为神经网络应用的一个臭名昭著的困难问题领域。我们确定了一种机制,使非常态预测目标能够阻止深层RL代理物的学习进展:\textit{能力损失},通过这种机制,受过目标值序列培训的网络丧失了在一段时间内迅速更新预测的能力。我们证明能力损失发生在一系列RL代理物和环境中,对履行稀薄的回报任务特别有害。然后我们提出了一个简单的常规化、初始特质正规化(InFER),通过将地貌的子空间倒退到初始化的价值,从而减轻这一现象,导致诸如蒙特祖马的复仇等稀薄环境的显著性能改善。我们的结论是,防止能力损失对于使代理商能够从在整个培训过程中获得的学习信号中获得最大好处至关重要。