This paper considers binary classification of high-dimensional features under a postulated model with a low-dimensional latent Gaussian mixture structure and non-vanishing noise. A generalized least squares estimator is used to estimate the direction of the optimal separating hyperplane. The estimated hyperplane is shown to interpolate on the training data. While the direction vector can be consistently estimated as could be expected from recent results in linear regression, a naive plug-in estimate fails to consistently estimate the intercept. A simple correction, that requires an independent hold-out sample, renders the procedure minimax optimal in many scenarios. The interpolation property of the latter procedure can be retained, but surprisingly depends on the way the labels are encoded.
翻译:本文考虑在假设模型下对高方特征进行二元分类,假设模型具有低维潜伏高斯混合结构和非稀有噪音。使用一般最低方位估计器来估计最佳分离超飞机的方向。高空估计值显示对培训数据进行内推。虽然可以如线性回归最新结果所预期的那样对方向矢量进行一致估计,但天真的插座估计无法一致估计拦截情况。一个简单的修正,需要独立的屏蔽样本,使程序在许多情形中最优化。后一种程序的内推特性可以保留,但令人惊讶地取决于标签的编码方式。