Recently the focus of the computer vision community has shifted from expensive supervised learning towards self-supervised learning of visual representations. While the performance gap between supervised and self-supervised has been narrowing, the time for training self-supervised deep networks remains an order of magnitude larger than its supervised counterparts, which hinders progress, imposes carbon cost, and limits societal benefits to institutions with substantial resources. Motivated by these issues, this paper investigates reducing the training time of recent self-supervised methods by various model-agnostic strategies that have not been used for this problem. In particular, we study three strategies: an extendable cyclic learning rate schedule, a matching progressive augmentation magnitude and image resolutions schedule, and a hard positive mining strategy based on augmentation difficulty. We show that all three methods combined lead up to 2.7 times speed-up in the training time of several self-supervised methods while retaining comparable performance to the standard self-supervised learning setting.
翻译:最近,计算机视觉界的重点已从昂贵的有监督的学习转向自我监督的视觉形象学习。虽然受监督的和受自我监督的网络之间的业绩差距正在缩小,但自我监督的深层网络的培训时间仍然比受监督的对口网络的规模更大,这阻碍了进步,造成碳成本,限制了拥有大量资源的机构的社会效益。受这些问题的驱动,本文件调查了最近采用各种没有用于这一问题的模型-不可知性战略自行监督方法的培训时间。特别是,我们研究了三种战略:可扩展的周期学习进度表、对应的递增规模和图像分辨率表,以及基于增强难度的硬性积极采矿战略。 我们表明,所有三种方法结合了几个自监督方法的培训时间,使培训时间加快了2.7倍,同时保持了与标准自我监督的学习环境类似的业绩。