Continual Learning (CL) methods mainly focus on avoiding catastrophic forgetting and learning representations that are transferable to new tasks. Recently, Wortsman et al. (2020) proposed a CL method, SupSup, which uses a randomly initialized, fixed base network (model) and finds a supermask for each new task that selectively keeps or removes each weight to produce a subnetwork. They prevent forgetting as the network weights are not being updated. Although there is no forgetting, the performance of the supermask is sub-optimal because fixed weights restrict its representational power. Furthermore, there is no accumulation or transfer of knowledge inside the model when new tasks are learned. Hence, we propose ExSSNeT (Exclusive Supermask SubNEtwork Training), which performs exclusive and non-overlapping subnetwork weight training. This avoids conflicting updates to the shared weights by subsequent tasks to improve performance while still preventing forgetting. Furthermore, we propose a novel KNN-based Knowledge Transfer (KKT) module that dynamically initializes a new task's mask based on previous tasks for improving knowledge transfer. We demonstrate that ExSSNeT outperforms SupSup and other strong previous methods on both text classification and vision tasks while preventing forgetting. Moreover, ExSSNeT is particularly advantageous for sparse masks that activate 2-10% of the model parameters, resulting in an average improvement of 8.3% over SupSup. Additionally, ExSSNeT scales to a large number of tasks (100), and our KKT module helps to learn new tasks faster while improving overall performance. Our code is available at https://github.com/prateeky2806/exessnet
翻译:持续学习(CL) 方法主要侧重于避免灾难性的遗忘和学习可转用于新任务的演示。 最近, Wortsman 等人 (2020年) 提议了一个 CL 方法, SupSup, 使用随机初始固定基础网络( 模型), 并为每个新任务找到一个超级软件, 有选择地保留或删除每个重量来生成子网络。 它们防止忘记网络重量没有更新。 尽管没有被遗忘, 超级图像的性能是次最佳的, 因为固定重量限制其代表力。 此外, 在学习新任务时, 在模型内部没有积累或转移知识。 因此, 我们建议 EXSSNET( 超小型超小型超小型基础网络子网络培训), 来进行独家和非重叠的子网络重量培训。 这避免了对共享重量进行相互矛盾的更新, 以便提高网络的重量, 但仍然防止遗忘。 此外, 我们提议了一个基于模型的KNNNSS( KT) 知识传输( KKT) 模块, 以动态方式开始一个新的任务掩码, 以先前的知识掩码为基础, 用于改进知识转移。 因此, 改进高级的高级 SLENSLO( SL) 和S) 更新的高级任务, 版本, 将使得前的高级任务超越了前的高级任务升级的SLOT) 版本升级的常规任务升级的S) 。