We consider the problem of building a state representation model for control, in a continual learning setting. As the environment changes, the aim is to efficiently compress the sensory state's information without losing past knowledge, and then use Reinforcement Learning on the resulting features for efficient policy learning. To this end, we propose S-TRIGGER, a general method for Continual State Representation Learning applicable to Variational Auto-Encoders and its many variants. The method is based on Generative Replay, i.e. the use of generated samples to maintain past knowledge. It comes along with a statistically sound method for environment change detection, which self-triggers the Generative Replay. Our experiments on VAEs show that S-TRIGGER learns state representations that allows fast and high-performing Reinforcement Learning, while avoiding catastrophic forgetting. The resulting system is capable of autonomously learning new information without using past data and with a bounded system size. Code for our experiments is attached in Appendix.
翻译:我们考虑在持续学习环境中建立用于控制的国家代表模式的问题。随着环境的变化,目标是有效地压缩感官状态的信息而不丧失过去的知识,然后在由此产生的特征上使用强化学习来进行有效的政策学习。为此,我们建议采用S-TRIGGER,这是适用于变式自动计算机器及其多种变体的连续国家代表学习的一般方法。这种方法基于“创造再现”法,即利用生成的样本来保持过去的知识。同时采用统计上健全的方法来探测环境变化,这又会自动触发“起源回放 ” 。我们在VAEs的实验表明,STRIGGER学会了允许快速和高性能强化学习的状态,同时避免灾难性的遗忘。由此形成的系统能够在不使用过去的数据和封闭的系统大小的情况下自主地学习新信息。我们实验的代码附在附录中。