In Multi-Task Learning (MTL), it is a common practice to train multi-task networks by optimizing an objective function, which is a weighted average of the task-specific objective functions. Although the computational advantages of this strategy are clear, the complexity of the resulting loss landscape has not been studied in the literature. Arguably, its optimization may be more difficult than a separate optimization of the constituting task-specific objectives. In this work, we investigate the benefits of such an alternative, by alternating independent gradient descent steps on the different task-specific objective functions and we formulate a novel way to combine this approach with state-of-the-art optimizers. As the separation of task-specific objectives comes at the cost of increased computational time, we propose a random task grouping as a trade-off between better optimization and computational efficiency. Experimental results over three well-known visual MTL datasets show better overall absolute performance on losses and standard metrics compared to an averaged objective function and other state-of-the-art MTL methods. In particular, our method shows the most benefits when dealing with tasks of different nature and it enables a wider exploration of the shared parameter space. We also show that our random grouping strategy allows to trade-off between these benefits and computational efficiency.
翻译:在多任务学习(MTL)中,通过优化一个客观功能来培训多任务网络是一种常见的做法,该功能是特定任务目标功能的加权平均值。虽然这一战略的计算优势是明确的,但文献中并未研究由此造成的损失情况的复杂性。可以说,优化可能比单独优化任务特定目标更为困难。在这项工作中,我们通过在不同任务目标功能上交替独立梯度下降步骤来调查这一备选方案的好处,我们制定了一种新颖的方法,将这一方法与最新的最佳优化方法结合起来。随着任务特定目标的分离以计算时间的增加为代价,我们提议随机任务分组,作为更好优化和计算效率之间的权衡。三个众所周知的视觉MTL数据集的实验结果显示损失和标准衡量方法的总体绝对性业绩,而平均目标功能和其他最先进的MTL方法则显示我们处理不同性质任务时的最大效益,并使得能够更广义地探索这些共享的空间参数。