Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer, a key feature of human learning. Though, state of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. Also, continual learning, that adds the temporal aspect to multitask, is often focused to the study of common pitfalls such as catastrophic forgetting instead of being studied at a large scale as a critical component to build the next generation artificial intelligence. We propose an evolutionary method that can generate a large scale multitask model, and can support the dynamic and continuous addition of new tasks. The generated multitask model is sparsely activated and integrates a task-based routing that guarantees bounded compute cost and fewer added parameters per task as the model expands. The proposed method relies on a knowledge compartmentalization technique to achieve immunity against catastrophic forgetting and other common pitfalls such as gradient interference and negative transfer. We empirically show that the proposed method can jointly solve and achieve competitive results on 69image classification tasks, for example achieving the best test accuracy reported fora model trained only on public data for competitive tasks such as cifar10: 99.43%.
翻译:多任务学习假设,能够从多重任务中学习的模型可以通过知识转让提高质量和效率,这是人类学习的一个关键特征。尽管最先进的多任务模型依靠对每项任务的高度定制以及杠杆规模和数据规模,而不是扩大任务数量。此外,不断学习,这增加了多任务的时间方面,往往侧重于研究共同的陷阱,如灾难性遗忘,而不是大规模研究,作为建设下一代人工智能的关键组成部分。我们提议了一种渐进方法,可以产生大规模多任务模型,并能够支持动态和持续增加新任务。生成的多任务模型的动态和持续增加。生成的多任务模型是很少激活的,并整合基于任务的路径,保证有约束的计算成本,随着模型的扩展,每个任务增加的参数更少。拟议方法依赖于知识分割技术,以实现对灾难性遗忘的豁免和其他常见的陷阱,如梯度干扰和负转移。我们从经验上表明,拟议的方法可以联合解决69image分类任务并取得竞争性结果,例如,实现所报告的最佳测试精确度为99-10。仅以公共数据培训的99-百分比模型,仅用于竞争性任务。