Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt quickly to tasks from few samples in dynamic environments. Such a feat is achieved through dynamic representations in an agent's policy network (obtained via reasoning about task context, model parameter updates, or both). However, obtaining rich dynamic representations for fast adaptation beyond simple benchmark problems is challenging due to the burden placed on the policy network to accommodate different policies. This paper addresses the challenge by introducing neuromodulation as a modular component to augment a standard policy network that regulates neuronal activities in order to produce efficient dynamic representations for task adaptation. The proposed extension to the policy network is evaluated across multiple discrete and continuous control environments of increasing complexity. To prove the generality and benefits of the extension in meta-RL, the neuromodulated network was applied to two state-of-the-art meta-RL algorithms (CAVIA and PEARL). The result demonstrates that meta-RL augmented with neuromodulation produces significantly better result and richer dynamic representations in comparison to the baselines.
翻译:元加强学习(meta-RL)算法使代理商能够迅速适应动态环境中少数样本的任务。这种成就是通过代理商政策网络(通过任务背景的推理、模型参数更新或两者兼而有之)的动态表现来实现的。然而,由于政策网络承受了适应不同政策的重担,因此在快速适应方面获得超越简单基准问题的丰富的动态表现是具有挑战性的。本文件通过引入神经调节作为模块组成部分来应对挑战,以强化规范神经活动的标准政策网络,从而为任务适应提供高效的动态表现。对政策网络的拟议扩展进行了跨复杂程度越来越高的多个离散连续控制环境的评估。为了证明元-RL扩展的一般性和益处,神经调节网络应用到两种最先进的元-RL算法(CAVIA和PEARL)。结果显示,与基线相比,借助神经调节增强的元-RL产生显著更好的结果和更丰富的动态表现。