Learning from multiple related tasks by knowledge sharing and transfer has become increasingly relevant over the last two decades. In order to successfully transfer information from one task to another, it is critical to understand the similarities and differences between the domains. In this paper, we introduce the notion of \emph{performance gap}, an intuitive and novel measure of the distance between learning tasks. Unlike existing measures which are used as tools to bound the difference of expected risks between tasks (e.g., $\mathcal{H}$-divergence or discrepancy distance), we theoretically show that the performance gap can be viewed as a data- and algorithm-dependent regularizer, which controls the model complexity and leads to finer guarantees. More importantly, it also provides new insights and motivates a novel principle for designing strategies for knowledge sharing and transfer: gap minimization. We instantiate this principle with two algorithms: 1. {gapBoost}, a novel and principled boosting algorithm that explicitly minimizes the performance gap between source and target domains for transfer learning; and 2. {gapMTNN}, a representation learning algorithm that reformulates gap minimization as semantic conditional matching for multitask learning. Our extensive evaluation on both transfer learning and multitask learning benchmark data sets shows that our methods outperform existing baselines.
翻译:过去二十年来,通过知识共享和转让从多个相关任务中学习的多重相关任务越来越具有相关性。为了成功地将信息从一个任务转移到另一个任务中,理解不同领域之间的相似性和差异至关重要。在本文件中,我们引入了对学习任务之间距离的直观和新颖的衡量标准 : \ emph{ 绩效差距 。 与作为工具用来限制任务之间预期风险差异的工具的现有措施( 例如,$\ mathcal{H}H} 差异或差异距离)不同的是,我们理论上表明,业绩差距可以被视为一个依赖数据和算法的正规化器,控制模型复杂性并导致更精确的保证。 更重要的是,它还提供了新的洞见,并激励了设计知识共享和转让战略的新原则: 最小化差距。 我们用两种算法来回现这一原则:1 {gapBoost}, 一种新而有原则的推算法,明确将源和目标领域之间绩效差距最小化为最小化; 和 2. {gapMTNN},一种代表制算法,将我们学习模型的最小化算法显示我们现有数据基准的学习基准, 。