Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches also in reinforcement learning, which typically come at the cost of high complexity. We hence investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.
翻译:智能分子应有能力利用以前学到的任务的知识,以便迅速和有效地学习新的任务。元学习方法已成为实现这一点的流行解决办法。然而,元加强学习(meta-RL)算法迄今仅限于任务分布狭窄的简单环境。此外,在监督和自我监督的学习中,为适应新任务而进行微调的训练之前,经过微调的范式已经成为一种简单而有效的解决办法。这使人们质疑元学习方法在强化学习方面的好处,这种方法通常以高复杂性的代价为代价。因此,我们从各种基于愿景的基准,包括Procgen、RLBench和Atari,对元加强学习方法进行了调查,在这些基于全新任务上进行评估。我们的研究结果表明,在对元学习方法进行评价时,对不同任务(而不是对同一任务前的不同变化)进行微调,对新任务进行微调,比对元测试时适应进行微调要好。这鼓励未来研究,从更具有挑战性的研究,从具有挑战性的研究前的模型分析,到更简单、更简单、更简单、更简单、更简单、更具有挑战性的未来计算方法。