In meta-reinforcement learning, an agent is trained in multiple different environments and attempts to learn a meta-policy that can efficiently adapt to a new environment. This paper presents RAMP, a Reinforcement learning Agent using Model Parameters that utilizes the idea that a neural network trained to predict environment dynamics encapsulates the environment information. RAMP is constructed in two phases: in the first phase, a multi-environment parameterized dynamic model is learned. In the second phase, the model parameters of the dynamic model are used as context for the multi-environment policy of the model-free reinforcement learning agent.
翻译:在元加强学习中,一个代理机构在多种不同环境中接受培训,并试图学习能够有效适应新环境的元政策,本文介绍了RAMP,这是一个使用模型参数的加强学习代理机构,使用模型参数,利用一个经过培训的神经网络来预测环境动态,将环境信息包罗在一起。RAMP分为两个阶段:第一阶段,学习一个多环境参数化动态模型;第二阶段,将动态模型的模型参数用作无模型强化学习代理的多环境政策的背景。