Recently, remarkable progress has been made in large-scale pre-trained model tuning, and inference efficiency is becoming more crucial for practical deployment. Early exiting in conjunction with multi-stage predictors, when cooperated with a parameter-efficient fine-tuning strategy, offers a straightforward way to achieve an inference-efficient model. However, a key challenge remains unresolved: How can early stages provide low-level fundamental features to deep stages while simultaneously supplying high-level discriminative features to early-stage predictors? To address this problem, we propose a Decoupled Multi-Predictor Optimization (DMPO) method to effectively decouple the low-level representative ability and high-level discriminative ability in early stages. First, in terms of architecture, we introduce a lightweight bypass module into multi-stage predictors for functional decomposition of shallow features from early stages, while a high-order statistics-based predictor is developed for early stages to effectively enhance their discriminative ability. To reasonably train our multi-predictor architecture, a decoupled optimization is proposed to allocate two-phase loss weights for multi-stage predictors during model tuning, where the initial training phase enables the model to prioritize the acquisition of discriminative ability of deep stages via emphasizing representative ability of early stages, and the latter training phase drives discriminative ability towards earlier stages as much as possible. As such, our DMPO can effectively decouple representative and discriminative abilities in early stages in terms of architecture design and model optimization. Experiments across various datasets and pre-trained backbones demonstrate that DMPO clearly outperforms its counterparts when reducing computational cost.
翻译:近年来,大规模预训练模型调优取得了显著进展,推理效率在实际部署中愈发关键。结合多阶段预测器的早期退出机制,配合参数高效微调策略,为实现推理高效模型提供了一种直接途径。然而,一个核心挑战尚未解决:早期阶段如何既为深层阶段提供低层基础特征,又为早期预测器提供高层判别性特征?为解决此问题,我们提出一种解耦多预测器优化方法,以有效解耦早期阶段的低层表征能力与高层判别能力。首先,在架构层面,我们在多阶段预测器中引入轻量级旁路模块,对早期阶段的浅层特征进行功能分解;同时为早期阶段设计了基于高阶统计量的预测器,以有效增强其判别能力。为合理训练多预测器架构,我们提出解耦优化策略,在模型调优过程中为多阶段预测器分配两阶段损失权重:初始训练阶段通过强调早期阶段的表征能力,使模型优先获取深层阶段的判别能力;后续训练阶段则尽可能将判别能力向早期阶段驱动。由此,DMPO在架构设计与模型优化层面均能有效解耦早期阶段的表征与判别能力。跨多个数据集与预训练骨干网络的实验表明,在降低计算成本时,DMPO明显优于同类方法。