Recent advances in deep learning optimization showed that just a subset of parameters are really necessary to successfully train a model. Potentially, such a discovery has broad impact from the theory to application; however, it is known that finding these trainable sub-network is a typically costly process. This inhibits practical applications: can the learned sub-graph structures in deep learning models be found at training time? In this work we explore such a possibility, observing and motivating why common approaches typically fail in the extreme scenarios of interest, and proposing an approach which potentially enables training with reduced computational effort. The experiments on either challenging architectures and datasets suggest the algorithmic accessibility over such a computational gain, and in particular a trade-off between accuracy achieved and training complexity deployed emerges.
翻译:最近深层学习优化的进展表明,只有一组参数是成功培训模型真正必要的。这种发现有可能从理论到应用产生广泛影响;然而,人们知道,找到这些可训练的子网络是一个典型的昂贵过程。这抑制了实际应用:深层学习模型中学到的子系统结构能否在培训时间找到?在这项工作中,我们探索了这样一种可能性,观察并激励为什么共同方法通常在极端感兴趣的情况下失败,并提出一种方法,这种方法有可能在减少计算努力的情况下使培训得以进行。 有关具有挑战性的架构和数据集的实验表明,在计算收益方面,特别是在实现的准确性与部署的培训复杂性之间,算法的可获取性是存在的。