自我监督在线终身学习 (SCALE)：无先验知识的实现 (SCALE: Online Self-Supervised Lifelong Learning without Prior Knowledge)

Unsupervised lifelong learning refers to the ability to learn over time while memorizing previous patterns without supervision. Although great progress has been made in this direction, existing work often assumes strong prior knowledge about the incoming data (e.g., knowing the class boundaries), which can be impossible to obtain in complex and unpredictable environments. In this paper, motivated by real-world scenarios, we propose a more practical problem setting called online self-supervised lifelong learning without prior knowledge. The proposed setting is challenging due to the non-iid and single-pass data, the absence of external supervision, and no prior knowledge. To address the challenges, we propose Self-Supervised ContrAstive Lifelong LEarning without Prior Knowledge (SCALE) which can extract and memorize representations on the fly purely from the data continuum. SCALE is designed around three major components: a pseudo-supervised contrastive loss, a self-supervised forgetting loss, and an online memory update for uniform subset selection. All three components are designed to work collaboratively to maximize learning performance. We perform comprehensive experiments of SCALE under iid and four non-iid data streams. The results show that SCALE outperforms the state-of-the-art algorithm in all settings with improvements up to 3.83%, 2.77% and 5.86% in terms of kNN accuracy on CIFAR-10, CIFAR-100, and TinyImageNet datasets.

翻译：无监督的终身学习 (Unsupervised lifelong learning) 指具备不需要监督下学习和记忆以往模式的能力。虽然在这方面已有一定进展，但现有工作通常假定关于传入数据的强大先验知识 (例如，知道类别边界) ，在复杂和不可预测的环境中可能难以获取。在这篇论文中，我们根据实际应用场景，提出更实际的问题设置，称为无先验知识的在线自我监督终身学习 (online self-supervised lifelong learning without prior knowledge)。所提出的设置由于没有外部监督，没有先验知识，单通道数据，以及来自不同领域的数据可能不是独立同分布（non-iid），因此具有较大的挑战性。为了解决这些挑战，我们设计了“无先验知识的自我监督对比终身学习 (Self-Supervised ContrAstive Lifelong LEarning without Prior Knowledge，SCALE))”，它能够在纯数据连续不断中即时提取和记忆表示形式。SCALE 围绕三个主要组成部分设计：伪监督对比损失、自我监督遗忘损失以及在线记忆更新的均匀子集选择。这三个组成部分都是协同工作的，以达到最大化学习性能的目的。我们在独立同分布和四个非独立同分布数据流下进行了全面实验。结果表明，SCALE 在所有设置中均优于现有算法，CIFAR-10、CIFAR-100 和 TinyImageNet 数据集上的 kNN 准确性分别提高了 3.83%、2.77% 和 5.86%。