Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy. The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task. In this work, we revisit this common practice via an empirical study on fine-grained image classification tasks and make two surprising observations. (1) Using separate bottom-layer parameters could achieve significantly better performance than the common practice and this phenomenon holds for different number of tasks jointly trained on different backbone architectures with different quantity of task-specific parameters. (2) A multi-task model with a small proportion of task-specific parameters from bottom layers can achieve competitive performance with independent models trained on each task separately and outperform a state-of-the-art MTL framework. Our observations suggest that people rethink the current sharing paradigm and adopt the new strategy of using separate bottom-layer parameters as a stronger baseline for model design in MTL.
翻译:多任务学习(MTL)中的硬参数共享使任务能够分享一些模型参数,降低存储成本,提高预测准确性;共同共享做法是在任务之间分享深神经网络的底层,同时对每项任务使用不同的顶层;在这项工作中,我们通过对细微图像分类任务进行经验研究,重新审视这一共同做法,并提出了两项令人惊讶的意见。 (1) 使用不同的底层参数可以比通常做法取得更好的业绩,对于不同任务具体参数不同的不同主干结构上联合培训的不同任务,这种现象存在。 (2) 多任务模式具有低层一小部分任务具体参数,能够以独立模式实现竞争性业绩,对每项任务分别进行培训,超越了最新技术的MTL框架。 我们的观察表明,人们重新考虑目前的共享模式,采用新的战略,使用不同的底层参数作为MTL模型设计的更强有力的基准。