Recent papers on the theory of representation learning has shown the importance of a quantity called diversity when generalizing from a set of source tasks to a target task. Most of these papers assume that the function mapping shared representations to predictions is linear, for both source and target tasks. In practice, researchers in deep learning use different numbers of extra layers following the pretrained model based on the difficulty of the new task. This motivates us to ask whether diversity can be achieved when source tasks and the target task use different prediction function spaces beyond linear functions. We show that diversity holds even if the target task uses a neural network with multiple layers, as long as source tasks use linear functions. If source tasks use nonlinear prediction functions, we provide a negative result by showing that depth-1 neural networks with ReLu activation function need exponentially many source tasks to achieve diversity. For a general function class, we find that eluder dimension gives a lower bound on the number of tasks required for diversity. Our theoretical results imply that simpler tasks generalize better. Though our theoretical results are shown for the global minimizer of empirical risks, their qualitative predictions still hold true for gradient-based optimization algorithms as verified by our simulations on deep neural networks.
翻译:最近关于代表性学习理论的论文表明,在从一组源任务向目标任务进行概括化时,所谓多样性的数量很重要。这些论文大多认为,功能绘图功能对预测的共同表达方式是线性的,对于源任务和目标任务都是线性的。在实践中,深层学习的研究人员根据新任务的难度,在经过预先培训的模型之后使用不同数量的额外层次。这促使我们问,当源任务和目标任务使用线性功能以外的不同预测功能空间时,多样性能否实现。我们表明,即使目标任务使用多层神经网络,只要源任务使用线性功能,多样性仍然有效。如果源性任务使用非线性预测功能,我们提供负面结果,显示带有ReLu激活功能的深度-1神经网络需要指数性多来源任务才能实现多样性。对于一般功能类,我们发现,椭圆维度对于多样性所需任务的数量来说,限制较小。我们的理论结果表明,更简单的任务将更好。尽管我们为全球最低经验风险显示的理论结果,但质量预测仍然是真实的,通过我们的深层模拟网络核实,以梯度为基础的优化算法。