Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for $K$-class problems (Wu, Zhang and Liu, 2010; Wang, Zhang and Wu, 2019), where $K$ is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in $K$. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in $K$. Though not being most efficient in computation, the OVA offers the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate finite sample performance.
翻译:多级概率估算是估算属于同一类别的数据点的有条件概率的问题。它具有统计分析和数据科学的广泛应用。最近,开发了一组加权支持矢量机(wSVMs),以通过混合学习来估计等级概率,解决1K美元类问题(Wu、Zhang和Liu,2010年;Wang、Zhang和Wu,2019年),即每类数为K美元。估计数字是稳健的,并且达到概率估算的高准确度,但是他们的学习是通过配对式的合并进行的,这需要多级时间(K$)。在本文件中,我们提出了两个新的学习计划,即基线学习和一至五全(OVA)学习,以进一步提高计算效率和估计准确性。特别是,基线学习具有最佳的计算复杂性,因为它是线性(K美元),尽管计算效率不高,但是OVA提供了所有程序的最佳估计准确性,这需要多级时间(K$) 。在本文件中,我们提出了两个新的学习计划,即基线学习和一至五全(OVA)学习计划,以计算效率和进一步显示不断的实验。我们所测定的实验将进一步展示。