We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, $\alpha$-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples given the source samples. Our results can also be applied to provide novel distribution-free generalization error upper bounds on these two aforementioned Gibbs algorithms. Our approach is versatile, as it also characterizes the generalization errors and excess risks of these two Gibbs algorithms in the asymptotic regime, where they converge to the $\alpha$-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical results, we show that the benefits of transfer learning can be viewed as a bias-variance trade-off, with the bias induced by the source distribution and the variance induced by the lack of target samples. We believe this viewpoint can guide the choice of transfer learning algorithms in practice.
翻译:我们对基于Gibbs的转移学习算法的一般化能力进行了信息理论分析,重点是两种流行性转移学习方法,即:美元-阿尔法$加权ERM和两个阶段-EMM。我们的主要结果就是使用有条件的对称 KL 数据,在产出假设和来源样本提供的目标培训样本之间对一般化行为进行精确的定性。我们的结果还可用于提供上述两种Gibbs算法的新的无分布性一般化错误上限。我们的方法是多方面的,因为它还描述了这两种通用性转移学算法的错误和超额风险,这两种算法分别与alpha$加权ERM和两个阶段-EMM相交汇。根据我们的理论结果,我们表明,转让学习的好处可以被视为一种偏差交易,由来源分布引起的偏差和缺乏目标样本引起的差异。我们认为,这种观点可以指导在实践中选择转移学习算法。