The idea of transfer in reinforcement learning (TRL) is intriguing: being able to transfer knowledge from one problem to another problem without learning everything from scratch. This promises quicker learning and learning more complex methods. To gain an insight into the field and to detect emerging trends, we performed a database search. We note a surprisingly late adoption of deep learning that starts in 2018. The introduction of deep learning has not yet solved the greatest challenge of TRL: generalization. Transfer between different domains works well when domains have strong similarities (e.g. MountainCar to Cartpole), and most TRL publications focus on different tasks within the same domain that have few differences. Most TRL applications we encountered compare their improvements against self-defined baselines, and the field is still missing unified benchmarks. We consider this to be a disappointing situation. For the future, we note that: (1) A clear measure of task similarity is needed. (2) Generalization needs to improve. Promising approaches merge deep learning with planning via MCTS or introduce memory through LSTMs. (3) The lack of benchmarking tools will be remedied to enable meaningful comparison and measure progress. Already Alchemy and Meta-World are emerging as interesting benchmark suites. We note that another development, the increase in procedural content generation (PCG), can improve both benchmarking and generalization in TRL.
翻译:强化学习(TRL)的转移理念令人感兴趣:能够将知识从一个问题转移到另一个问题,而无需从零到零学习一切。这保证了更快的学习和学习更为复杂的方法。为了深入了解实地和发现新出现的趋势,我们进行了数据库搜索。我们注意到,从2018年开始深层学习的采用出乎意料地晚于2018年才被采纳。深层次学习尚未解决TRL的最大挑战:概括化。在不同领域之间的转移运作良好,当领域有很强的相似之处(如山区汽车到卡特波尔),而大多数TRL出版物侧重于同一领域内几乎没有差异的不同任务时,则侧重于不同的任务。我们所见到的大多数TRL应用程序都比得更快地学习和学习更复杂。我们发现,为了对实地进行比较,我们仍缺乏统一的基准。我们认为,这是一个令人失望的情况。对于未来,我们注意到:(1) 需要明确的任务相似性衡量。(2) 普遍化需要改进。 推广方法,通过MCTS规划将深层次的学习与记忆结合起来,或通过LSTMs进行记忆。(3) 基准工具的缺乏将基准化工具加以纠正,以便能够进行有意义的比较和衡量进展。 。我们已经的Alchemememmeal-G标准正在形成一个有趣的改进。