Motivated by the CATHGEN data, we develop a new statistical learning method for simultaneous variable selection and parameter estimation under the context of generalized partly linear models for data with high-dimensional covariates. The method is referred to as the broken adaptive ridge (BAR) estimator, which is an approximation of the $L_0$-penalized regression by iteratively performing reweighted squared $L_2$-penalized regression. The generalized partly linear model extends the generalized linear model by including a non-parametric component to construct a flexible model for modeling various types of covariate effects. We employ the Bernstein polynomials as the sieve space to approximate the non-parametric functions so that our method can be implemented easily using the existing R packages. Extensive simulation studies suggest that the proposed method performs better than other commonly used penalty-based variable selection methods. We apply the method to the CATHGEN data with a binary response from a coronary artery disease study, which motivated our research, and obtained new findings in both high-dimensional genetic and low-dimensional non-genetic covariates.
翻译:暂无翻译