Classical machine learning algorithms often assume that the data are drawn i.i.d. from a stationary probability distribution. Recently, continual learning emerged as a rapidly growing area of machine learning where this assumption is relaxed, namely, where the data distribution is non-stationary, i.e., changes over time. However, data distribution drifts may interfere with the learning process and erase previously learned knowledge; thus, continual learning algorithms must include specialized mechanisms to deal with such distribution drifts. A distribution drift may change the class labels distribution, the input distribution, or both. Moreover, distribution drifts might be abrupt or gradual. In this paper, we aim to identify and categorize different types of data distribution drifts and potential assumptions about them, to better characterize various continual-learning scenarios. Moreover, we propose to use the distribution drift framework to provide more precise definitions of several terms commonly used in the continual learning field.
翻译:传统机器学习算法往往假定数据是从固定概率分布中提取的。最近,持续学习作为一个迅速增长的机器学习领域出现,而这一假设是放松的,即数据分布是非静止的,即随时间变化。然而,数据分布流可能干扰学习过程,抹去以前学到的知识;因此,持续学习算法必须包括处理这种分布流的专门机制。分配流可能改变类标签分布、输入分布或两者兼而有之。此外,分配流可能是突然的或渐进的。在本文件中,我们的目标是确定和分类不同类型的数据分布流和关于这些数据的可能假设,更好地描述各种持续学习情景。此外,我们提议使用分布流框架,提供持续学习领域常用的若干术语的更精确定义。