DNNs are becoming less and less over-parametrised due to recent advances in efficient model design, through careful hand-crafted or NAS-based methods. Relying on the fact that not all inputs require the same amount of computation to yield a confident prediction, adaptive inference is gaining attention as a prominent approach for pushing the limits of efficient deployment. Particularly, early-exit networks comprise an emerging direction for tailoring the computation depth of each input sample at runtime, offering complementary performance gains to other efficiency optimisations. In this paper, we decompose the design methodology of early-exit networks to its key components and survey the recent advances in each one of them. We also position early-exiting against other efficient inference solutions and provide our insights on the current challenges and most promising future directions for research in the field.
翻译:由于最近通过谨慎的手工制作或NAS方法在高效模型设计方面取得的最新进展,DNN越来越越来越不完全。基于并非所有投入都需要同样多的计算才能产生自信的预测这一事实,适应性推论作为推动有效部署的限度的突出方法日益受到注意。特别是,早期退出网络包括一个新出现的方向,即调整每个输入样本在运行时的计算深度,为其他效率优化提供补充性绩效收益。在本文中,我们将提前退出网络的设计方法分解为关键组成部分,并调查其中每个部分的最新进展。我们还将提前退出与其他高效的推断解决方案相对立,并介绍我们当前挑战以及最有希望的未来实地研究方向。