Deep learning has achieved tremendous success by training increasingly large models, which are then compressed for practical deployment. We propose a drastically different approach to compact and optimal deep learning: We decouple the Degrees of freedom (DoF) and the actual number of parameters of a model, optimize a small DoF with predefined random linear constraints for a large model of arbitrary architecture, in one-stage end-to-end learning. Specifically, we create a recurrent parameter generator (RPG), which repeatedly fetches parameters from a ring and unpacks them onto a large model with random permutation and sign flipping to promote parameter decorrelation. We show that gradient descent can automatically find the best model under constraints with faster convergence. Our extensive experimentation reveals a log-linear relationship between model DoF and accuracy. Our RPG demonstrates remarkable DoF reduction and can be further pruned and quantized for additional run-time performance gain. For example, in terms of top-1 accuracy on ImageNet, RPG achieves $96\%$ of ResNet18's performance with only $18\%$ DoF (the equivalent of one convolutional layer) and $52\%$ of ResNet34's performance with only $0.25\%$ DoF! Our work shows a significant potential of constrained neural optimization in compact and optimal deep learning.
翻译:深层学习通过培训越来越大型的模型取得了巨大成功,这些模型随后被压缩用于实际应用。我们建议对压缩和最佳深层学习采取截然不同的方法:我们将自由度与模型实际参数数脱钩:我们将自由度与模型实际参数数区分开来,在一阶段的端到端学习中,为任意建筑的大模型预先界定随机线性限制,优化一个小型多功能基金。具体地说,我们创建了一个经常性参数生成器(RPG),反复从一个环上获取参数,然后将其拆解到一个大模型上,随机调整和签名,以促进参数解码。我们表明,梯度下降可以在制约下自动找到最佳模型数,更快地趋同。我们的广泛实验揭示了模型多功能与精确之间的日志线性关系。我们的RPPG展示了显著的减少多功能框架,并且可以进一步调整和量化,以获得额外的运行时间收益。例如,在图像网上,RPPPA的性能达到96 $,而ResNet18的性能以18美元(相当于一个革命层层层层层层层层层层层)和ResNet的优化工作只有52美元。