A key problem in multi-task learning (MTL) research is how to select high-quality auxiliary tasks automatically. This paper presents GradTS, an automatic auxiliary task selection method based on gradient calculation in Transformer-based models. Compared to AUTOSEM, a strong baseline method, GradTS improves the performance of MT-DNN with a bert-base-cased backend model, from 0.33% to 17.93% on 8 natural language understanding (NLU) tasks in the GLUE benchmarks. GradTS is also time-saving since (1) its gradient calculations are based on single-task experiments and (2) the gradients are re-used without additional experiments when the candidate task set changes. On the 8 GLUE classification tasks, for example, GradTS costs on average 21.32% less time than AUTOSEM with comparable GPU consumption. Further, we show the robustness of GradTS across various task settings and model selections, e.g. mixed objectives among candidate tasks. The efficiency and efficacy of GradTS in these case studies illustrate its general applicability in MTL research without requiring manual task filtering or costly parameter tuning.
翻译:多任务学习(MTL)研究的一个关键问题是,如何自动选择高质量的辅助任务。本文件展示了GradTS, 这是一种基于变异模型的梯度计算自动辅助任务选择方法。 与AUTOSEM, 一种强大的基线方法相比, GradTS 提高MT- DNN的性能, 使用Birt- base 案例后端模型, 从0.33% 提高到 17.93%, 在 GLUE 基准的8项自然语言理解( NLU) 任务上, 从8项自然语言理解( NLU) 任务上, 从0. 33% 到 17.93% 。 GradTS 也是节省时间的, 因为(1) 其梯度计算基于单任务实验, (2) 梯度在候选任务设定变化时无需额外实验再使用。 例如, 在8项GLUE 分类任务中, GradTS 平均比具有21.32%的时间, 比具有可比GPU的 AUTOSP 消费的AUTS 。 此外, 我们显示了GradTS 在各种任务设置和模型选择中, 例如候选任务之间的混合目标中, 。 。 。 这些案例研究的效率和效力说明了这些案例研究说明其在MTTTS的通用研究中的通用应用性能说明它在MTE研究中的通用应用性。