With the latest advances in deep learning, there has been a lot of focus on the online learning paradigm due to its relevance in practical settings. Although many methods have been investigated for optimal learning settings in scenarios where the data stream is continuous over time, sparse networks training in such settings have often been overlooked. In this paper, we explore the problem of training a neural network with a target sparsity in a particular case of online learning: the anytime learning at macroscale paradigm (ALMA). We propose a novel way of progressive pruning, referred to as \textit{Anytime Progressive Pruning} (APP); the proposed approach significantly outperforms the baseline dense and Anytime OSP models across multiple architectures and datasets under short, moderate, and long-sequence training. Our method, for example, shows an improvement in accuracy of $\approx 7\%$ and a reduction in the generalization gap by $\approx 22\%$, while being $\approx 1/3$ rd the size of the dense baseline model in few-shot restricted imagenet training. We further observe interesting nonmonotonic transitions in the generalization gap in the high number of megabatches-based ALMA. The code and experiment dashboards can be accessed at \url{https://github.com/landskape-ai/Progressive-Pruning} and \url{https://wandb.ai/landskape/APP}, respectively.
翻译:由于在深层次学习方面的最新进展,由于在实际环境中的相关性,对在线学习模式进行了大量关注。虽然在数据流持续时间较长的情况下,对优化学习环境的许多方法进行了调查,但这类环境中的网络培训很少经常被忽视。在本文中,我们探讨了培训神经网络的问题,在网上学习的特定案例中,目标偏颇:在宏观范式(ALMA)中随时学习。我们提出了一种渐进式搜索的新方式,称为\textit{随时进步普林宁}(APP);拟议的方法大大超过了在短、中、长期培训中、长期的多个结构和数据集中的基线密度和时间性OSP模型。例如,我们的方法显示,美元(approx 7) 的准确性有所改进,普遍化差距减少 $(approx 22) 美元(d) 。我们提出了一种称为$approx 1/3美元(PPPl) 的渐进式搜索方法,在少数限制的图像网培训中,我们进一步观察了多个架构和数据库/ASloadal-motoal=lalal-alalalalal-alizal-al-alizal-al-al-al-alizal-al acal acal acalation 。在几号中可以进入高空隙。