Sufficient dimension reduction has received much interest over the past 30 years. Most existing approaches focus on statistical models linking the response to the covariate through a regression equation, and as such are not adapted to binary classification problems. We address the question of dimension reduction for binary classification by fitting a localized nearest-neighbor logistic model with $\ell_1$-penalty in order to estimate the gradient of the conditional probability of interest. Our theoretical analysis shows that the pointwise convergence rate of the gradient estimator is optimal under very mild conditions. The dimension reduction subspace is estimated using an outer product of such gradient estimates at several points in the covariate space. Our implementation uses cross-validation on the misclassification rate to estimate the dimension of this subspace. We find that the proposed approach outperforms existing competitors in synthetic and real data applications.
翻译:暂无翻译