Subgroup analysis has attracted growing attention due to its ability to identify meaningful subgroups from a heterogeneous population and thereby improving predictive power. However, in many scenarios such as social science and biology, the covariates are possibly highly correlated due to the existence of common factors, which brings great challenges for group identification and is neglected in the existing literature. In this paper, we aim to fill this gap in the ``diverging dimension" regime and propose a center-augmented subgroup identification method under the Factor Augmented (sparse) Linear Model framework, which bridge dimension reduction and sparse regression together. The proposed method is flexible to the possibly high cross-sectional dependence among covariates and inherits the computational advantage with complexity $O(nK)$, in contrast to the $O(n^2)$ complexity of the conventional pairwise fusion penalty method in the literature, where $n$ is the sample size and $K$ is the number of subgroups. We also investigate the asymptotic properties of its oracle estimators under conditions on the minimal distance between group centroids. To implement the proposed approach, we introduce a Difference of Convex functions based Alternating Direction Method of Multipliers (DC-ADMM) algorithm and demonstrate its convergence to a local minimizer in finite steps. We illustrate the superiority of the proposed method through extensive numerical experiments and a real macroeconomic data example. An \texttt{R} package \texttt{SILFS} implementing the method is also available on CRAN.
翻译:暂无翻译