Representation multi-task learning (MTL) and transfer learning (TL) have achieved tremendous success in practice. However, the theoretical understanding of these methods is still lacking. Most existing theoretical works focus on cases where all tasks share the same representation, and claim that MTL and TL almost always improve performance. However, as the number of tasks grow, assuming all tasks share the same representation is unrealistic. Also, this does not always match empirical findings, which suggest that a shared representation may not necessarily improve single-task or target-only learning performance. In this paper, we aim to understand how to learn from tasks with \textit{similar but not exactly the same} linear representations, while dealing with outlier tasks. We propose two algorithms that are \textit{adaptive} to the similarity structure and \textit{robust} to outlier tasks under both MTL and TL settings. Our algorithms outperform single-task or target-only learning when representations across tasks are sufficiently similar and the fraction of outlier tasks is small. Furthermore, they always perform no worse than single-task learning or target-only learning, even when the representations are dissimilar. We provide information-theoretic lower bounds to show that our algorithms are nearly \textit{minimax} optimal in a large regime.
翻译:表示型多任务学习(MTL)和迁移学习(TL)在实践中取得了巨大的成功。然而,对这些方法的理论理解仍然缺乏。大部分现有的理论工作侧重于所有任务分享同一表示的情况,并声称MTL和TL几乎总是能够提高性能。然而,随着任务数量的增加,假设所有任务共享相同的表示是不现实的。此外,这并不总是与经验证据相符,该证据表明共享的表示可能不一定能够提高单任务或只关注目标的学习性能。在本文中,我们旨在了解如何从具有“相似但并非完全相同”的线性表示中进行学习,同时处理异常任务。我们提出了两种算法,它们能够在MTL和TL设置下适应相似性结构并对异常任务具有鲁棒性。当任务间的表示足够相似且异常任务的比例很小时,我们的算法优于单任务或仅关注目标的学习。此外,即使表示不同,它们的性能也总是不劣于单任务学习或只关注目标的学习。我们提供了信息论下界,以显示我们的算法在大范围内几乎是最小值性最优的。