We study the mixed-integer optimization (MIO) approach to feature subset selection in nonlinear kernel support vector machines (SVMs) for binary classification. First proposed for linear regression in the 1970s, this approach has recently moved into the spotlight with advances in optimization algorithms and computer hardware. The goal of this paper is to establish an MIO approach for selecting the best subset of features for kernel SVM classification. To measure the performance of subset selection, we use the kernel-target alignment, which is the distance between the centroids of two response classes in a high-dimensional feature space. We propose a mixed-integer linear optimization (MILO) formulation based on the kernel-target alignment for feature subset selection, and this MILO problem can be solved to optimality using optimization software. We also derive a reduced version of the MILO problem to accelerate our MILO computations. Experimental results show good computational efficiency for our MILO formulation with the reduced problem. Moreover, our method can often outperform the linear-SVM-based MILO formulation and recursive feature elimination in prediction performance, especially when there are relatively few data instances.
翻译:我们研究了混合内核优化(MIO)方法,以突出非线性内核辅助矢量机器(SVMs)中子集选,用于二进制分类。最初提议在1970年代进行线性回归,最近这一方法随着优化算法和计算机硬件的进步而成为焦点。本文的目的是建立混合内核优化(MIO)方法,以选择内核SVM分类最佳的子集集特征。为了测量子集选择的性能,我们使用了内核目标对齐,即高维特征空间中两个响应类的中子体之间的距离。我们提议在基于内核-线性优化(MILO)组合的基础上,采用混合内核线性优化(MILO)的配方。这个MILO问题可以用最优化的软件解决为最佳性。我们还从微量的MILO问题中提取了一个缩写版本,以加速我们的MILO的计算。实验结果显示我们的MLO配方在减少问题方面的计算效率良好。此外,我们的方法往往超过以线性-SVMLO为基的配方和在预测性能中反复消除的特性。