Recent research has proposed a series of specialized optimization algorithms for deep multi-task models. It is often claimed that these multi-task optimization (MTO) methods yield solutions that are superior to the ones found by simply optimizing a weighted average of the task losses. In this paper, we perform large-scale experiments on a variety of language and vision tasks to examine the empirical validity of these claims. We show that, despite the added design and computational complexity of these algorithms, MTO methods do not yield any performance improvements beyond what is achievable via traditional optimization approaches. We highlight alternative strategies that consistently yield improvements to the performance profile and point out common training pitfalls that might cause suboptimal results. Finally, we outline challenges in reliably evaluating the performance of MTO algorithms and discuss potential solutions.
翻译:最近的研究为深层多任务模型提出了一系列专门的优化算法,常常声称这些多任务优化(MTO)方法产生的解决办法优于仅仅优化任务损失加权平均值所发现的解决办法。在本文件中,我们对各种语言和愿景任务进行大规模试验,以审查这些索赔的经验有效性。我们表明,尽管这些算法增加了设计和计算的复杂性,但MTO方法不会产生超出传统优化方法所能实现的绩效改进。我们强调不断改进绩效配置的替代战略,并指出可能造成不理想结果的共同培训陷阱。最后,我们概述了在可靠评估MTO算法绩效和讨论潜在解决方案方面的挑战。