Meta-reinforcement learning (RL) addresses the problem of sample inefficiency in deep RL by using experience obtained in past tasks for a new task to be solved. However, most meta-RL methods require partially or fully on-policy data, i.e., they cannot reuse the data collected by past policies, which hinders the improvement of sample efficiency. To alleviate this problem, we propose a novel off-policy meta-RL method, embedding learning and evaluation of uncertainty (ELUE). An ELUE agent is characterized by the learning of a feature embedding space shared among tasks. It learns a belief model over the embedding space and a belief-conditional policy and Q-function. Then, for a new task, it collects data by the pretrained policy, and updates its belief based on the belief model. Thanks to the belief update, the performance can be improved with a small amount of data. In addition, it updates the parameters of the neural networks to adjust the pretrained relationships when there are enough data. We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks.
翻译:元加强学习(RL)通过利用以往任务中的经验解决新任务解决新任务,解决深RL的抽样效率低下问题。然而,大多数元RL方法需要部分或全部的政策数据,即它们不能再利用过去政策收集的数据,从而妨碍提高抽样效率。为了缓解这一问题,我们提议了一种新型的离政策元-RL方法,嵌入学习和评估不确定性(ELUE),一个ELUE代理器的特点是学习了将空间嵌入各任务之间的特征。它学习了嵌入空间的信仰模型和信仰-有条件政策和功能。然后,对于一项新任务,它通过预先培训的政策收集数据,并根据信仰模式更新其信念。由于信仰更新,可以用少量的数据改进性能。此外,它更新了神经网络的参数,以便在有足够数据时调整预先培训的关系。我们证明ELUE通过对MER基准的实验,超越了先进的元的元RL方法。