Linear Discriminant Analysis (LDA) is an important classification approach. Its simple linear form makes it easy to interpret and it is capable to handle multi-class responses. It is closely related to other classical multivariate statistical techniques, such as Fisher's discriminant analysis, canonical correlation analysis and linear regression. In this paper we strengthen its connection to multivariate response regression by characterizing the explicit relationship between the discriminant directions and the regression coefficient matrix. This key characterization leads to a new regression-based multi-class classification procedure that is flexible enough to deploy any existing structured, regularized, and even non-parametric, regression methods. Moreover, our new formulation is amenable to analysis: we establish a general strategy of analyzing the excess misclassification risk of the proposed classifier for all aforementioned regression techniques. As applications, we provide complete theoretical guarantees for using the widely used $\ell_1$-regularization as well as for using the reduced-rank regression, neither of which has yet been fully analyzed in the LDA context. Our theoretical findings are corroborated by extensive simulation studies and real data analysis.
翻译:暂无翻译