Multi-task learning (MTL) has been widely applied in online advertising and recommender systems. To address the negative transfer issue, recent studies have proposed optimization methods that thoroughly focus on the gradient alignment of directions or magnitudes. However, since prior study has proven that both general and specific knowledge exist in the limited shared capacity, overemphasizing on gradient alignment may crowd out task-specific knowledge, and vice versa. In this paper, we propose a transference-driven approach CoGrad that adaptively maximizes knowledge transference via Coordinated Gradient modification. We explicitly quantify the transference as loss reduction from one task to another, and then derive an auxiliary gradient from optimizing it. We perform the optimization by incorporating this gradient into original task gradients, making the model automatically maximize inter-task transfer and minimize individual losses. Thus, CoGrad can harmonize between general and specific knowledge to boost overall performance. Besides, we introduce an efficient approximation of the Hessian matrix, making CoGrad computationally efficient and simple to implement. Both offline and online experiments verify that CoGrad significantly outperforms previous methods.
翻译:多任务学习(MTL)已被广泛应用于在线广告和推荐系统。为了解决负面转移问题,最近的研究提出了全面侧重于方向或量的梯度对齐的优化方法。然而,由于先前的研究证明,在有限的共享能力中存在一般和特定知识,过度强调梯度对齐可能会排挤特定任务的知识,反之亦然。在本文中,我们提议一种由转移驱动的方法,通过协调梯度修改,适应性地最大限度地实现知识转移。我们明确地将转移量化为从一个任务到另一个任务的损失减少,然后从优化中获取一个辅助梯度。我们通过将这一梯度纳入原始任务梯度来进行优化,使模型自动最大限度地实现跨任务转移并最大限度地减少个人损失。因此,CoGrad可以将一般知识与具体知识相协调,以提高总体绩效。此外,我们提出一个高效的海珊矩阵近似,使科格拉德的计算效率和易于执行。我们下线和在线实验都证实,科格拉德明显超越了先前的方法。</s>