Artificial neural networks are promising for general function approximation but challenging to train on non-independent or non-identically distributed data due to catastrophic forgetting. The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later. However, a large replay buffer results in a heavy memory burden, especially for onboard and edge devices with limited memory capacities. We propose memory-efficient reinforcement learning algorithms based on the deep Q-network algorithm to alleviate this problem. Our algorithms reduce forgetting and maintain high sample efficiency by consolidating knowledge from the target Q-network to the current Q-network. Compared to baseline methods, our algorithms achieve comparable or better performance in both feature-based and image-based tasks while easing the burden of large experience replay buffers.
翻译:人工神经网络在一般函数逼近上非常有前途,但在训练时遇到了灾难性遗忘的问题,特别是当训练数据不是独立同分布时。经验回放缓冲区是深度强化学习中的一个标准组件,通常通过将经验存储在一个大缓冲区中并在以后的训练中使用这些经验来降低遗忘并提高样本效率。然而,大量的回放缓冲区会导致对内存的沉重负担,特别是对于内存容量有限的板载和边缘设备来说。我们提出了一种基于深度 Q 网络算法的记忆高效的强化学习算法,以缓解这个问题。我们的算法通过从目标 Q 网络向当前 Q 网络合并知识来减少遗忘并保持高样本效率。与基线方法相比,在特征和图像任务方面,我们的算法可以实现相当或更好的性能,同时减轻了大型经验回放缓冲区的负担。