Support vector machine (SVM) is a well-known statistical technique for classification problems in machine learning and other fields. An important question for SVM is the selection of covariates (or features) for the model. Many studies have considered model selection methods. As is well-known, selecting one winning model over others can entail considerable instability in predictive performance due to model selection uncertainties. This paper advocates model averaging as an alternative approach, where estimates obtained from different models are combined in a weighted average. We propose a model weighting scheme and provide the theoretical underpinning for the proposed method. In particular, we prove that our proposed method yields a model average estimator that achieves the smallest hinge risk among all feasible combinations asymptotically. To remedy the computational burden due to a large number of feasible models, we propose a screening step to eliminate the uninformative features before combining the models. Results from real data applications and a simulation study show that the proposed method generally yields more accurate estimates than existing methods.
翻译:支持矢量机(SVM)是机器学习和其他领域分类问题的一个众所周知的统计技术。对于SVM来说,一个重要的问题是选择模型的共同变量(或特征)。许多研究都考虑了模型选择方法。众所周知,选择一个优于其他模式的模式可能会由于模型选择的不确定性而在预测性能方面造成相当大的不稳定。本文提倡平均模式作为一种替代方法,从不同模型获得的估计数以加权平均值加以合并。我们提出了一个模型加权办法,并为拟议方法提供理论基础。我们特别证明,我们拟议的方法产生了一个模型平均估计数字,该估计数字在所有可行的组合中达到最小的临界风险。为了补救大量可行模型造成的计算负担,我们建议采取筛选步骤,在合并模型之前消除不可靠的特征。来自实际数据应用和模拟研究的结果表明,拟议的方法通常产生比现有方法更准确的估计数。