A dataset is a shred of crucial evidence to describe a task. However, each data point in the dataset does not have the same potential, as some of the data points can be more representative or informative than others. This unequal importance among the data points may have a large impact in rehearsal-based continual learning, where we store a subset of the training examples (coreset) to be replayed later to alleviate catastrophic forgetting. In continual learning, the quality of the samples stored in the coreset directly affects the model's effectiveness and efficiency. The coreset selection problem becomes even more important under realistic settings, such as imbalanced continual learning or noisy data scenarios. To tackle this problem, we propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration and trains them in an online manner. Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting. We validate the effectiveness of our coreset selection mechanism over various standard, imbalanced, and noisy datasets against strong continual learning baselines, demonstrating that it improves task adaptation and prevents catastrophic forgetting in a sample-efficient manner.
翻译:数据集是描述任务的关键证据的一小块。 然而, 数据集中的每个数据点并不具有同样的潜力, 因为有些数据点比其他数据点更具代表性或信息性。 数据点之间的这种不平等重要性可能会对以排练为基础的持续学习产生巨大影响, 我们在这里储存一组培训范例( 核心集), 以便稍后重放, 减轻灾难性的遗忘。 在持续学习中, 存储于核心集的样本的质量直接影响到模型的效能和效率。 在现实环境中, 核心集选择问题变得更为重要, 比如持续学习不平衡或数据紧张的假设。 为了解决这个问题, 我们提议在线核心集选择( OCS), 这是一种简单而有效的方法, 选择每个循环中最具代表性和丰富信息的核心集, 并在线培训它们。 我们提议的方法最大限度地使模型适应目标数据集, 同时选择高亲近的样本来直接抑制灾难性的遗忘。 我们验证了核心集选择机制在各种标准、 不平衡和对坚固的学习基线的实效性。