We introduce ASPiRe (Adaptive Skill Prior for RL), a new approach that leverages prior experience to accelerate reinforcement learning. Unlike existing methods that learn a single skill prior from a large and diverse dataset, our framework learns a library of different distinction skill priors (i.e., behavior priors) from a collection of specialized datasets, and learns how to combine them to solve a new task. This formulation allows the algorithm to acquire a set of specialized skill priors that are more reusable for downstream tasks; however, it also brings up additional challenges of how to effectively combine these unstructured sets of skill priors to form a new prior for new tasks. Specifically, it requires the agent not only to identify which skill prior(s) to use but also how to combine them (either sequentially or concurrently) to form a new prior. To achieve this goal, ASPiRe includes Adaptive Weight Module (AWM) that learns to infer an adaptive weight assignment between different skill priors and uses them to guide policy learning for downstream tasks via weighted Kullback-Leibler divergences. Our experiments demonstrate that ASPiRe can significantly accelerate the learning of new downstream tasks in the presence of multiple priors and show improvement on competitive baselines.
翻译:我们引入了ASPiRe(Adaptial Skill Prior for RL),这是一种利用先前经验来加快强化学习的新方法。与以前从大型和多样化的数据集中学习单一技能的现有方法不同,我们的框架从专门数据集的集合中学习了不同技能前科(即行为前科)的图书馆,并学习了如何结合这些前科,以解决新的任务。这一配方使算法能够获取一套专门技能前科,这些前科更可用于下游任务;然而,它也带来了更多的挑战,即如何有效地将这些未结构的技能组合起来,在新任务之前形成新的前科。具体地说,它要求代理人不仅确定使用前科的技能(即行为前科),而且要确定如何(按顺序或同时)将前科技能合并形成新的前科。为实现这一目标,ASPiRe包括适应性重力模模模模模模块(AWMM),该模块学习了不同技能前科之间的适应权重分配,并利用它们指导下游任务的政策学习,通过加权的 Kullack-Lielbel dever 差异。我们的实验显示,在前期的竞争性任务中可以大大加速前期学习。