Early-exiting dynamic neural networks (EDNN), as one type of dynamic neural networks, has been widely studied recently. A typical EDNN has multiple prediction heads at different layers of the network backbone. During inference, the model will exit at either the last prediction head or an intermediate prediction head where the prediction confidence is higher than a predefined threshold. To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data. This brings a train-test mismatch problem that all the prediction heads are optimized on all types of data in training phase while the deeper heads will only see difficult inputs in testing phase. Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions. To mitigate this problem, we formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively. We name our method BoostNet. Our experiments show it achieves the state-of-the-art performance on CIFAR100 and ImageNet datasets in both anytime and budgeted-batch prediction modes. Our code is released at https://github.com/SHI-Labs/Boosted-Dynamic-Networks.
翻译:早期释放的动态神经网络(EDNN)是一种动态神经网络,最近已经广泛研究过。典型的EDNN在网络主干的各个层面都有多个预测头。在推断过程中,模型将退出最后一个预测头或中间预测头,因为预测信心高于预设的临界值。为了优化模型,这些预测头和网络主干都用每批培训数据来培训。这带来了一个火车测试不匹配问题,即所有预测头在培训阶段的所有类型的数据上都得到优化,而更深层头在测试阶段只能看到困难的投入。在两个阶段对培训和测试投入的不同处理将造成培训和测试数据分布之间的不匹配。为了缓解这一问题,我们设计了一个EDNN,作为受加速梯度启发的添加模型,并提出多种培训技术,以有效优化模型。我们命名了我们的方法布斯特网。我们的实验显示,它实现了CFAR100和图像网络数据集的状态性能,在时间和预算的匹配的预测模式中都能看到。我们的代码在 https://gistrabLbus/DHAmbs。