Class-Incremental Learning (CIL) enables models to learn new classes continually while preserving past knowledge. Recently, vision-language models like CLIP offer transferable features via multi-modal pre-training, making them well-suited for CIL. However, real-world visual and linguistic concepts are inherently hierarchical: a textual concept like "dog" subsumes fine-grained categories such as "Labrador" and "Golden Retriever," and each category entails its images. But existing CLIP-based CIL methods fail to explicitly capture this inherent hierarchy, leading to fine-grained class features drift during incremental updates and ultimately to catastrophic forgetting. To address this challenge, we propose HASTEN (Hierarchical Semantic Tree Anchoring) that anchors hierarchical information into CIL to reduce catastrophic forgetting. First, we employ an external knowledge graph as supervision to embed visual and textual features in hyperbolic space, effectively preserving hierarchical structure as data evolves. Second, to mitigate catastrophic forgetting, we project gradients onto the null space of the shared hyperbolic mapper, preventing interference with prior tasks. These two steps work synergistically to enable the model to resist forgetting by maintaining hierarchical relationships. Extensive experiments show that HASTEN consistently outperforms existing methods while providing a unified structured representation.
翻译:类增量学习(CIL)使模型能够持续学习新类别,同时保留已有知识。近年来,如CLIP等视觉-语言模型通过多模态预训练提供可迁移特征,使其非常适合CIL任务。然而,现实世界的视觉与语言概念本质上是层次化的:例如文本概念“狗”包含细粒度类别如“拉布拉多犬”和“金毛寻回犬”,而每个类别又对应其图像。但现有基于CLIP的CIL方法未能显式捕捉这种内在层次结构,导致增量更新过程中细粒度类别特征漂移,最终引发灾难性遗忘。为解决这一挑战,我们提出HASTEN(层次语义树锚定)方法,将层次信息锚定至CIL中以减轻灾难性遗忘。首先,我们采用外部知识图谱作为监督,将视觉与文本特征嵌入双曲空间,从而在数据演化过程中有效保持层次结构。其次,为缓解灾难性遗忘,我们将梯度投影至共享双曲映射器的零空间,避免对先前任务产生干扰。这两个步骤协同作用,使模型通过维持层次关系来抵抗遗忘。大量实验表明,HASTEN在提供统一结构化表征的同时,持续优于现有方法。