Many fundamental machine learning tasks can be formulated as a problem of learning with vector-valued functions, where we learn multiple scalar-valued functions together. Although there is some generalization analysis on different specific algorithms under the empirical risk minimization principle, a unifying analysis of vector-valued learning under a regularization framework is still lacking. In this paper, we initiate the generalization analysis of regularized vector-valued learning algorithms by presenting bounds with a mild dependency on the output dimension and a fast rate on the sample size. Our discussions relax the existing assumptions on the restrictive constraint of hypothesis spaces, smoothness of loss functions and low-noise condition. To understand the interaction between optimization and learning, we further use our results to derive the first generalization bounds for stochastic gradient descent with vector-valued functions. We apply our general results to multi-class classification and multi-label classification, which yield the first bounds with a logarithmic dependency on the output dimension for extreme multi-label classification with the Frobenius regularization. As a byproduct, we derive a Rademacher complexity bound for loss function classes defined in terms of a general strongly convex function.
翻译:许多基本的机床学习任务可以被确定为一个与矢量价值函数有关的学习问题,我们在这里一起学习多种天平价值值函数。虽然根据实验风险最小化原则对不同的特定算法进行了一些一般化分析,但是仍然缺乏在规范化框架内对矢量价值学习进行统一分析。在本文件中,我们开始对正规化矢量价值学习算法进行一般化分析,方法是提出对产出层面有轻微依赖的界限和抽样规模的快速速度。我们的讨论放松了关于假设空间的限制性限制、损失功能的平稳和低噪音状态的现有假设。为了理解优化与学习之间的相互作用,我们进一步利用我们的结果得出与矢量价值函数的随机梯度梯度下降的第一个一般界限。我们把我们的一般结果应用于多级分类和多标签分类,从而产生对极端多标签分类产出层面的逻辑依赖,与Frobenius的正规化。作为副产品,我们得出了Rademacher的复杂程度,将损失函数类别捆绑起来,该类系以一般强烈的 convex函数定义。