By adding exiting layers to the deep learning networks, early exit can terminate the inference earlier with accurate results. The passive decision-making of whether to exit or continue the next layer has to go through every pre-placed exiting layer until it exits. In addition, it is also hard to adjust the configurations of the computing platforms alongside the inference proceeds. By incorporating a low-cost prediction engine, we propose a Predictive Exit framework for computation- and energy-efficient deep learning applications. Predictive Exit can forecast where the network will exit (i.e., establish the number of remaining layers to finish the inference), which effectively reduces the network computation cost by exiting on time without running every pre-placed exiting layer. Moreover, according to the number of remaining layers, proper computing configurations (i.e., frequency and voltage) are selected to execute the network to further save energy. Extensive experimental results demonstrate that Predictive Exit achieves up to 96.2% computation reduction and 72.9% energy-saving compared with classic deep learning networks; and 12.8% computation reduction and 37.6% energy-saving compared with the early exit under state-of-the-art exiting strategies, given the same inference accuracy and latency.
翻译:通过在深层学习网络中增加下层,早期退出可以提前终止推断,并得出准确的结果。 是否退出或继续下层的被动决策必须经过每个预置的下层,直到下层退出为止。 此外,还很难调整计算平台的配置,同时调整推论的收益。 通过采用低成本预测引擎,我们提议了一个计算和节能深层学习应用的预测退出框架。 预测退出可以预测网络将退出的地点(即确定完成推论的剩余层数量),从而有效降低网络计算成本,即不运行每个预置的下层而及时退出。 此外,根据剩余层的数量,选择适当的计算配置(即频率和电流)来实施网络以进一步节能。 广泛的实验结果显示,与经典深层学习网络相比,预测退出将实现高达96.2%的计算减少和72.9%的节能率; 与早期退出状态下精确度相比,12.8%的计算减少了和37.6%的节能战略。