In practice, multi-task learning (through learning features shared among tasks) is an essential property of deep neural networks (NNs). While infinite-width limits of NNs can provide good intuition for their generalization behavior, the well-known infinite-width limits of NNs in the literature (e.g., neural tangent kernels) assume specific settings in which wide ReLU-NNs behave like shallow Gaussian Processes with a fixed kernel. Consequently, in such settings, these NNs lose their ability to benefit from multi-task learning in the infinite-width limit. In contrast, we prove that optimizing wide ReLU neural networks with at least one hidden layer using L2-regularization on the parameters promotes multi-task learning due to representation-learning - also in the limiting regime where the network width tends to infinity. We present an exact quantitative characterization of this infinite width limit in an appropriate function space that neatly describes multi-task learning.
翻译:在实践中,多任务学习(通过任务之间共享的学习特征)是深神经网络(NNs)的一个基本属性。 虽然NNs无限宽限能为其一般行为提供良好的直觉,但文献中众所周知的NNs无限宽限(例如,神经相干内核)所假定的特定环境是,广泛的RELU-NNs行为像浅浅高萨进程,有固定内核。因此,在这种环境中,这些NNs丧失了在无限宽限内从多任务学习中受益的能力。相比之下,我们证明,利用参数上的L2常规化,优化至少一个隐藏层的至少一个隐藏层的RELU神经网络,能够促进代言学习的多任务学习,这也是在网络宽度倾向于无限的限制性制度中。我们在一个能准确描述多任务学习的适当功能空间中对这一无限宽限进行精确的定量定性。