Multitask learning is being increasingly adopted in applications domains like computer vision and reinforcement learning. However, optimally exploiting its advantages remains a major challenge due to the effect of negative transfer. Previous works have tracked down this issue to the disparities in gradient magnitudes and directions across tasks, when optimizing the shared network parameters. While recent work has acknowledged that negative transfer is a two-fold problem, existing approaches fall short as they only focus on either homogenizing the gradient magnitude across tasks; or greedily change the gradient directions, overlooking future conflicts. In this work, we introduce RotoGrad, an algorithm that tackles negative transfer as a whole: it jointly homogenizes gradient magnitudes and directions, while ensuring training convergence. We show that RotoGrad outperforms competing methods in complex problems, including multi-label classification in CelebA and computer vision tasks in the NYUv2 dataset. A Pytorch implementation can be found in https://github.com/adrianjav/rotograd .
翻译:多任务学习越来越多地在计算机视野和强化学习等应用领域被采用。然而,由于负面转移的影响,最佳利用优势仍然是一项重大挑战。在优化共享网络参数时,以往的工作跟踪了这一问题,发现各任务之间在梯度大小和方向上的差异。虽然最近的工作承认负转移是一个双重问题,但现有方法尚不尽如人意,因为它们只侧重于使跨任务梯度数值趋同;或贪婪地改变梯度方向,忽略未来冲突。在这项工作中,我们引入了罗托格拉德(RotoGrad)这一算法,该算法处理整个负转移:它共同将梯度大小和方向同化,同时确保培训趋同。我们展示了罗托格拉德在复杂问题上的相互竞争方法,包括CelebA的多标签分类和NYUv2数据集的计算机视觉任务。可在https://github.com/adrianjav/rotograd中找到一个Pytorch 。