The ability of continual learning systems to transfer knowledge from previously seen tasks in order to maximize performance on new tasks is a significant challenge for the field, limiting the applicability of continual learning solutions to realistic scenarios. Consequently, this study aims to broaden our understanding of transfer and its driving forces in the specific case of continual reinforcement learning. We adopt SAC as the underlying RL algorithm and Continual World as a suite of continuous control tasks. We systematically study how different components of SAC (the actor and the critic, exploration, and data) affect transfer efficacy, and we provide recommendations regarding various modeling options. The best set of choices, dubbed ClonEx-SAC, is evaluated on the recent Continual World benchmark. ClonEx-SAC achieves 87% final success rate compared to 80% of PackNet, the best method in the benchmark. Moreover, the transfer grows from 0.18 to 0.54 according to the metric provided by Continual World.
翻译:持续学习系统能够从以往看到的任务中转让知识,以便最大限度地提高新任务的业绩,这是实地面临的一项重大挑战,限制了持续学习解决方案对现实情景的适用性,因此,本研究旨在扩大我们对转让及其驱动力的理解,特别是在持续强化学习的具体情况下。我们采用SAC作为基本的RL算法和连续世界的一套连续控制任务。我们系统地研究SAC的不同组成部分(行为者和批评家、探索和数据)如何影响转让效率,并就各种建模选项提出建议。根据ClonEx-SAC的近期世界基准评估了一套最佳选择,称为ClontalEx-SAC。ClondEx-SAC最终成功率为87%,而PackNet为80%,这是该基准中的最佳方法。此外,根据Contual World提供的标准,转让量从0.18增加到0.54。