Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone. Compared to training tasks separately, MTL significantly reduces computational costs, improves data efficiency, and potentially enhances model performance by leveraging knowledge across tasks. Hence, it has been adopted in a variety of applications, ranging from computer vision to natural language processing and speech recognition. Among them, there is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction to benefit all tasks. Despite achieving impressive results on many benchmarks, directly applying these approaches without using appropriate regularization techniques might lead to suboptimal solutions on real-world problems. In particular, standard training that minimizes the empirical loss on the training data can easily suffer from overfitting to low-resource tasks or be spoiled by noisy-labeled ones, which can cause negative transfer between tasks and overall performance drop. To alleviate such problems, we propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning. Accordingly, we present a novel MTL training methodology, encouraging the model to find task-based flat minima for coherently improving its generalization capability on all tasks. Finally, we conduct comprehensive experiments on a variety of applications to demonstrate the merit of our proposed approach to existing gradient-based MTL methods, as suggested by our developed theory.
翻译:多任务学习(MTL)是培训深层神经网络的一个广泛使用和强大的学习模式,它使一个骨干能够学习不止一个目标。与单独培训任务相比,MTL显著降低计算成本,提高数据效率,并有可能通过利用不同任务的知识提高示范性业绩。因此,它被采用于从计算机视野到自然语言处理和语音识别等多种应用中。其中,在MTL中出现了一种新兴的工作路线,重点是操纵任务梯度,以得出最终的梯度下降方向,使所有任务受益。尽管在许多基准上取得了令人印象深刻的成果,但直接采用这些方法而不使用适当的正规化技术,可能会导致在现实世界问题上出现不理想的解决办法。特别是,标准培训可以最大限度地减少培训数据上的经验损失,因为过于适应于低资源任务,或者被杂乱的标签令任务破坏。在任务和总体绩效下降之间造成负转移。为了缓解这些问题,我们提议利用最近采用的一种培训方法,称为Sharpness-aware limination,这可以加强单一任务理论应用的模型能力。因此,我们目前采用一种新式的渐进式的学习方法,我们目前采用一种新的方法,用来来展示一种改进现有的方法。