A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume that each task is associated with a subset of latent discrete skills from a (potentially small) inventory. In turn, skills correspond to parameter-efficient (sparse / low-rank) model parameterisations. By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills. To favour non-trivial soft partitions of skills across tasks, we experiment with a series of inductive biases, such as an Indian Buffet Process prior and a two-speed learning rate. We evaluate our latent-skill model on two main settings: 1) multitask reinforcement learning for grounded instruction following on 8 levels of the BabyAI platform; and 2) few-shot adaptation of pre-trained text-to-text generative models on CrossFit, a benchmark comprising 160 NLP tasks. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning, compared to baselines with fully shared, task-specific, or conditionally generated parameters where knowledge is entangled across tasks. In addition, we show how discrete skills help interpretability, as they yield an explicit hierarchy of tasks.
翻译:模块设计鼓励神经模型分解和重新研究知识的不同方面,以便更系统地对新任务进行普及。 在这项工作中,我们假设每一项任务都与(潜在小的)库存的一组潜在离散技能相关联。反过来,技能与参数效率(粗略/低级)模型参数相对应。通过共同学习这些和任务技能分配矩阵,每项任务的网络都以积极技能参数的平均值即时调整。为了有利于不同任务之间技能的非三角软分割,我们试验一系列感知偏差,如印度百草枯进程前和双速学习率。我们评估了我们两种主要环境中的潜伏技能模型:1)根据BabyAI平台的8级进行的基础教学的多任务强化学习;和2)在Crosfit上对预先训练的文本到文字的变色模型进行微调整,这是由160项NLP任务组成的基准。我们发现,一个网络的模块设计大大提高了强化学习的样本效率,在监督的分级标准中,在监督的分级层次上,我们通过完全的分级化的分级任务,将一个基准显示我们生成的分级任务。