The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method that grows adaptively depending on the task sequence. We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks. The subspace's high expressivity allows CSP to perform well for many different tasks while growing sublinearly with the number of tasks. Our method does not suffer from forgetting and displays positive transfer to new tasks. CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation).
翻译:持续获得新知识和技能的能力对于自主代理机构至关重要。 现有方法通常基于固定规模的模式,这些模式努力学习大量不同的行为,或者以与任务数量相比规模差的日益扩大的模式为基础。 在这项工作中,我们的目标是通过设计一种适应性随任务序列而增长的方法,在代理人的规模和性能之间取得更好的平衡。 我们引入了连续政策子空间(CSP),这是一种新办法,它逐步建立政策子空间,用于在一系列任务上培训强化学习代理机构。 子空间的高度直观性使得CSP能够很好地完成许多不同的任务,同时与任务数量相继增长。 我们的方法不会因遗忘而受到影响,不会显示向新任务的积极转移。 CSP在两个具有挑战性的领域,即Brax(locomove)和Condial World(manpulation),在一系列广泛的情景上超越了一些流行基线。</s>