Sample efficiency has been a key issue in reinforcement learning (RL). An efficient agent must be able to leverage its prior experiences to quickly adapt to similar, but new tasks and situations. Meta-RL is one attempt at formalizing and addressing this issue. Inspired by recent progress in meta-RL, we introduce BIMRL, a novel multi-layer architecture along with a novel brain-inspired memory module that will help agents quickly adapt to new tasks within a few episodes. We also utilize this memory module to design a novel intrinsic reward that will guide the agent's exploration. Our architecture is inspired by findings in cognitive neuroscience and is compatible with the knowledge on connectivity and functionality of different regions in the brain. We empirically validate the effectiveness of our proposed method by competing with or surpassing the performance of some strong baselines on multiple MiniGrid environments.
翻译:样本效率是强化学习(RL)中的一个关键问题。高效的代理机构必须能够利用其先前的经验,迅速适应类似但又新的任务和情况。Meta-RL是尝试正式化和解决这一问题的一种尝试。受元RL最近进展的启发,我们引入了BIMRL,这是一个新型的多层结构,以及一个新的大脑启发记忆模块,将有助于代理机构快速适应少数情况下的新任务。我们还利用这个记忆模块设计一个新的内在奖赏,用以指导代理机构的探索。我们的结构受认知神经科学发现启发,与大脑中不同区域的连通性和功能知识相兼容。我们通过在多迷你里德环境中竞争或超过一些强力基线的性能,实证了我们拟议方法的有效性。