We have developed a model for online continual or lifelong reinforcement learning (RL) inspired on the insect brain. Our model leverages the offline training of a feature extraction and a common general policy layer to enable the convergence of RL algorithms in online settings. Sharing a common policy layer across tasks leads to positive backward transfer, where the agent continuously improved in older tasks sharing the same underlying general policy. Biologically inspired restrictions to the agent's network are key for the convergence of RL algorithms. This provides a pathway towards efficient online RL in resource-constrained scenarios.
翻译:我们开发了受昆虫大脑启发的在线连续或终身强化学习模式。 我们的模式利用了功能提取离线培训和共同的一般政策层,使功能提取和通用政策层在在线环境中相互融合。 共享一个共同的政策层在各种任务之间可以带来积极的后向转移,使代理人在旧任务方面不断改进,共享相同的基本总体政策。 生物激励对代理人网络的限制是功能提取和通用政策层趋同的关键。 这为在资源受限制的情景中实现高效的在线RL提供了一条途径。