Efficient deep neural network (DNN) models equipped with compact operators (e.g., depthwise convolutions) have shown great potential in reducing DNNs' theoretical complexity (e.g., the total number of weights/operations) while maintaining a decent model accuracy. However, existing efficient DNNs are still limited in fulfilling their promise in boosting real-hardware efficiency, due to their commonly adopted compact operators' low hardware utilization. In this work, we open up a new compression paradigm for developing real-hardware efficient DNNs, leading to boosted hardware efficiency while maintaining model accuracy. Interestingly, we observe that while some DNN layers' activation functions help DNNs' training optimization and achievable accuracy, they can be properly removed after training without compromising the model accuracy. Inspired by this observation, we propose a framework dubbed DepthShrinker, which develops hardware-friendly compact networks via shrinking the basic building blocks of existing efficient DNNs that feature irregular computation patterns into dense ones with much improved hardware utilization and thus real-hardware efficiency. Excitingly, our DepthShrinker framework delivers hardware-friendly compact networks that outperform both state-of-the-art efficient DNNs and compression techniques, e.g., a 3.06% higher accuracy and 1.53$\times$ throughput on Tesla V100 over SOTA channel-wise pruning method MetaPruning. Our codes are available at: https://github.com/facebookresearch/DepthShrinker.
翻译:高高效的深神经网络(DNNN)模型配备了紧凑操作员(例如,深度进化),这些模型在降低 DNNS 理论复杂性(例如,总重量/操作数量)的同时保持一个体面的模型准确性方面显示出巨大的潜力。然而,现有的高效 DNNS在履行其提高实际硬件效率的承诺方面仍然有限,因为其通常采用的常规操作员的硬件利用率较低。在这项工作中,我们为开发实际硬件高效的 DNNS打开了新的压缩模式,从而提高硬件效率,同时保持模型准确性。有趣的是,我们观察到,虽然一些 DNNNNS 级的启动功能有助于DNNS的培训优化和可实现的准确性(例如,总重量/操作的总数),但在培训之后,在不影响模型准确性的前提下,可以适当地删除这些功能。我们提议了一个假的深度智能Srinker框架,这个框架通过缩小现有高效的 DNNNS的计算模式的基本建筑构件,将硬件利用率提高到更高,从而提高实际软件的效率。 令人感叹的是,我们内部Srinker$100的S-ral-ral-rage-rassal-rassyal-ral comstrisport treforisport s