Via an overparameterized linear model with Gaussian features, we provide conditions for good generalization for multiclass classification of minimum-norm interpolating solutions in an asymptotic setting where both the number of underlying features and the number of classes scale with the number of training points. The survival/contamination analysis framework for understanding the behavior of overparameterized learning problems is adapted to this setting, revealing that multiclass classification qualitatively behaves like binary classification in that, as long as there are not too many classes (made precise in the paper), it is possible to generalize well even in some settings where the corresponding regression tasks would not generalize. Besides various technical challenges, it turns out that the key difference from the binary classification setting is that there are relatively fewer positive training examples of each class in the multiclass setting as the number of classes increases, making the multiclass problem "harder" than the binary one.
翻译:通过一个具有高斯特征的超度参数线性模型,我们为在无症状环境下对最低北端内插解决方案进行多级分类提供了良好的一般化条件,在这种环境中,基本特征的数量和与培训点数量相关的班级规模的数量都存在。 用于理解多度分解学习问题的行为的求生/污染分析框架适应了这一环境,揭示了多级分类质量行为类似于二元分类,只要没有太多的班级(在文件中作了精确的描述),即使在相应的回归任务不会普遍化的某些环境中,也有可能将最低北端内插解决方案的多级分类化。 除了各种技术挑战外,结果显示二进级分类设置的关键区别在于,随着班级数量的增加,多级设置中每个班级的正面培训实例相对较少,使多级问题“更难”于二进制问题。