The generalization ability of most meta-reinforcement learning (meta-RL) methods is largely limited to test tasks that are sampled from the same distribution used to sample training tasks. To overcome the limitation, we propose Latent Dynamics Mixture (LDM) that trains a reinforcement learning agent with imaginary tasks generated from mixtures of learned latent dynamics. By training a policy on mixture tasks along with original training tasks, LDM allows the agent to prepare for unseen test tasks during training and prevents the agent from overfitting the training tasks. LDM significantly outperforms standard meta-RL methods in test returns on the gridworld navigation and MuJoCo tasks where we strictly separate the training task distribution and the test task distribution.
翻译:多数元加强学习(meta-RL)方法的普及能力基本上限于测试从用于抽样培训任务的相同分布中抽样的任务。为了克服这一限制,我们提议Lentant Dynamics Mixture(LDM)培训一个强化学习代理,该代理将学习的已知潜在动态混合所产生的想象任务作为强化学习代理。通过培训关于混合任务的政策以及最初的培训任务,LDM允许该代理在培训期间为无形的测试任务做准备,防止该代理在培训任务中过度适应。LDM在电网导航和MuJoCo任务中测试回报的标准元RL方法明显优于标准,我们严格区分培训任务分配和测试任务分配。