Adapting model parameters to incoming streams of data is a crucial factor to deep learning scalability. Interestingly, prior continual learning strategies in online settings inadvertently anchor their updated parameters to a local parameter subspace to remember old tasks, else drift away from the subspace and forget. From this observation, we formulate a trade-off between constructing multiple parameter modes and allocating tasks per mode. Mode-Optimized Task Allocation (MOTA), our contributed adaptation strategy, trains multiple modes in parallel, then optimizes task allocation per mode. We empirically demonstrate improvements over baseline continual learning strategies and across varying distribution shifts, namely sub-population, domain, and task shift.
翻译:将模型参数调整到输入的数据流是深层学习可扩展性的一个关键因素。 有趣的是,在网络环境中,先前的连续学习战略无意中将其更新参数锁定到一个本地参数子空间,以记住旧任务,否则就会从子空间漂移而忘记。 从这个观察,我们在构建多个参数模式和按模式分配任务之间做出权衡。 模式优化的任务分配(MOTA),我们贡献的适应战略,平行培训多种模式,然后优化每个模式的任务分配。 我们的经验表明,在基线持续学习战略方面以及在不同的分布变化(即子人口、域和任务转移)方面有所改进。