Margin-based classifiers have been popular in both machine learning and statistics for classification problems. Since a large number of classifiers are available, one natural question is which type of classifiers should be used given a particular classification task. We answer this question by investigating the asymptotic performance of a family of large-margin classifiers under the two component mixture models in situations where the data dimension $p$ and the sample $n$ are both large. This family covers a broad range of classifiers including support vector machine, distance weighted discrimination, penalized logistic regression, and large-margin unified machine as special cases. The asymptotic results are described by a set of nonlinear equations and we observe a close match of them with Monte Carlo simulation on finite data samples. Our analytical studies shed new light on how to select the best classifier among various classification methods as well as on how to choose the optimal tuning parameters for a given method.
翻译:在机器学习和分类问题统计中,基于边际的分类方法很受欢迎。由于有大量的分类方法,因此自然有一个问题,即哪类分类方法应该用于哪类分类方法,并赋予特定的分类任务。我们回答这个问题的方法是,在数据维度为$p$和样本为$美元都很大的情况下,调查两个成分混合模型下大型边际分类方法的无症状性能。这一类方法包括一系列广泛的分类方法,包括支持矢量机、远程加权歧视、惩罚性物流回归和大边际统一机作为特殊案例。一套非线性方程式描述了非线性方程式的结果,我们观察到它们与限定数据样品的蒙特卡洛模拟非常接近。我们的分析研究为如何选择各种分类方法中的最佳分类方法以及如何选择某种方法的最佳调整参数提供了新的线索。