深强化学习中的不过渡性和非标准化和普及化 (Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning)

Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually updated. However, we find evidence that neural networks exhibit a memory effect where these transient non-stationarities can permanently impact the latent representation and adversely affect generalisation performance. Consequently, to improve generalisation of deep RL agents, we propose Iterated Relearning (ITER). ITER augments standard RL training by repeated knowledge transfer of the current policy into a freshly initialised network, which thereby experiences less non-stationarity during training. Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks ProcGen and Multiroom.

翻译：即便在固定环境中,加强学习也可能出现不常态现象,例如,大多数RL算法利用非静态行为政策,在整个培训过程中收集新的数据。由于这种非静态现象的瞬间性,深长的RL和单一神经网络往往没有明确地解决这个问题,但不断更新。然而,我们发现有证据表明,神经网络表现出一种记忆效应,在这些瞬间非静态现象可能永久影响潜在代表性并不利地影响一般化性能的情况下。因此,为了改进深层RL代理物的普及性,我们提议采用异式再学习(ITER)。ITER通过反复将现行政策的知识转移至新的初始网络,从而增强标准RL培训,从而在培训过程中较少出现不常态现象。我们实验性地表明,ITER改进了具有挑战性的普及基准ProcGen和多室的绩效。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日