The problem of class incremental learning (CIL) is considered. State-of-the-art approaches use a dynamic architecture based on network expansion (NE), in which a task expert is added per task. While effective from a computational standpoint, these methods lead to models that grow quickly with the number of tasks. A new NE method, dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity. This is accomplished by the introduction of dense connections between the intermediate layers of the task expert networks, that enable the transfer of knowledge from old to new tasks via feature sharing and reusing. This sharing is implemented with a cross-task attention mechanism, based on a new task attention block (TAB), that fuses information across tasks. Unlike traditional attention mechanisms, TAB operates at the level of the feature mixing and is decoupled with spatial attentions. This is shown more effective than a joint spatial-and-task attention for CIL. The proposed DNE approach can strictly maintain the feature space of old classes while growing the network and feature scale at a much slower rate than previous methods. In result, it outperforms the previous SOTA methods by a margin of 4\% in terms of accuracy, with similar or even smaller model scale.
翻译:本文探讨了类递增学习(CIL)问题。现有的方法在网络扩展(NE)的基础上使用动态架构,每个任务会添加一个任务专家。尽管从计算角度来看这些方法很有效,但是这些方法会导致模型随着任务数量的增加而快速增长。我们提出一种新的NE方法,名为稠密网络扩展(DNE),以实现精度和模型复杂度之间更好的权衡。我们采用任务专家网络中的中间层之间的密集连接来实现这一点,从而通过功能共享和重复使用在旧任务和新任务之间传输知识。这种共享是通过交叉任务注意机制实现的,该机制基于一个新的任务注意块(TAB)来融合跨任务的信息。与传统的注意机制不同,TAB在特征混合的级别上运作,并且与空间注意机制分离。实验证明,与基于空间和任务的联合注意相比,该方法在CIL中更加有效。我们的DNE方法可以在网络和特征规模明显更少的情况下,严格维护旧类的特征空间,同时增加网络和特征规模的速度更慢。结果,它在准确性方面优于之前的最优方法4%,与之前方法相比,尺寸相似甚至更小。