Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained. Unfortunately, such a problem setting is often impractical if not infeasible since many real-world tasks rely on sequential learning, e.g., data are decentralized or collected in a streaming fashion. In this paper, we conduct the first thorough and dedicated investigation on self-supervised pre-training with streaming data, aiming to shed light on the model behavior under this overlooked setup. Specifically, we pre-train over 500 models on four categories of pre-training streaming data from ImageNet and DomainNet and evaluate them on three types of downstream tasks and 12 different downstream datasets. Our studies show that, somehow beyond our expectation, with simple data replay or parameter regularization, sequential self-supervised pre-training turns out to be an efficient alternative for joint pre-training, as the performances of the former are mostly on par with those of the latter. Moreover, catastrophic forgetting, a common issue in sequential supervised learning, is much alleviated in sequential self-supervised learning (SSL), which is well justified through our comprehensive empirical analysis on representations and the sharpness of minima in the loss landscape. Our findings, therefore, suggest that, in practice, for SSL, the cumbersome joint training can be replaced mainly by sequential learning, which in turn enables a much broader spectrum of potential application scenarios.
翻译:培训前自我监督的先前工作侧重于联合培训设想,即大规模未贴标签的数据假定一次性作为投入提供,然后才被培训为学员。不幸的是,这样的问题设置往往不切实际,即使不可行,因为许多现实世界的任务依赖于顺序学习,例如数据分散或以流式方式收集。在本文中,我们首次对带有流数据自我监督的培训前准备工作进行彻底和专门调查,目的是了解这一被忽视的设置下的更广义行为模式。具体地说,我们预先培训500多个模型,涉及从图像网和DomainNet(DomainNet)和DemainNet(Dimagain Net)培训前数据流的四个类别,对这四个类别的培训前数据流数据进行的培训前流动。不幸的是,这种问题设置往往不切实际,因为许多实际任务依赖于连续的下游任务和12个不同的下游数据集。 我们的研究显示,由于简单的数据重现或参数调整,顺序上的自我监管前培训是联合培训的一个有效替代方法,因为前者的业绩大多与后一种模式相同。此外,在连续的连续的学习过程中,一个常见的问题,在不断监督的学习中,因此,在连续的学习中可以大大减轻地取代了我们自我分析。