Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) have been developed to estimate class probabilities through ensemble learning for $K$-class problems (Wang, Shen and Liu, 2008; Wang, Zhang and Wu, 2019), where $K$ is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demand polynomial time in $K$. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in $K$. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate finite sample performance.
翻译:多级概率估算是估算属于同一类别的数据点的有条件概率的问题,因为其具有共变信息,它在统计分析和数据科学中应用了广泛的应用。最近,开发了一组加权支持矢量机(wSVMs),以通过混合学习来估计等级概率(Wang、Shen和Liu,2008年;Wang、Zhang和Wu,2019年),因为类数是K美元。估计数字是稳健的,并且达到概率估算的高准确度,但是他们的学习是通过配对式的配对式组合进行,要求多数值时间为K美元。在本文件中,我们提出了两个新的学习计划,即基线学习和一至五全(OVA)学习,以进一步提高计算效率和估计准确性。特别是,基线学习具有最佳的计算复杂性,因为它是线性为K美元。因此,估计数字是免费的,显示是一致的。我们进一步进行了广泛的数字实验,以展示定点的样品。