High-dimensional data can often display heterogeneity due to heteroscedastic variance or inhomogeneous covariate effects. Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. The former is computationally challenging due to the non-smooth nature of the check loss, and the latter is sensitive to heavy-tailed error distributions. In this paper, we propose and study (penalized) robust expectile regression (retire), with a focus on iteratively reweighted $\ell_1$-penalization which reduces the estimation bias from $\ell_1$-penalization and leads to oracle properties. Theoretically, we establish the statistical properties of the retire estimator under two regimes: (i) low-dimensional regime in which $d \ll n$; (ii) high-dimensional regime in which $s\ll n\ll d$ with $s$ denoting the number of significant predictors. In the high-dimensional setting, we carefully characterize the solution path of the iteratively reweighted $\ell_1$-penalized retire estimation, adapted from the local linear approximation algorithm for folded-concave regularization. Under a mild minimum signal strength condition, we show that after as many as $\log(\log d)$ iterations the final iterate enjoys the oracle convergence rate. At each iteration, the weighted $\ell_1$-penalized convex program can be efficiently solved by a semismooth Newton coordinate descent algorithm. Numerical studies demonstrate the competitive performance of the proposed procedure compared with either non-robust or quantile regression based alternatives.
翻译:高维数据常常因方差异方性或协变量效应的非齐次性而呈现出异质性。惩罚分位和期望回归方法为检测高维数据中的异方差提供了有用的工具。前者由于检查损失的非光滑性而具有计算挑战,而后者对重尾误差分布敏感。在本文中,我们提出并研究了(惩罚的)鲁棒期望回归(retire),着重于迭代加权的$\ell_1$正则化,该正则化可以降低$\ell_1$正则化的估计偏差,并带来适于Oracle的性质。从理论上讲,我们在两种情况下建立了retire估计器的统计性质:(i)当$d\ll n$时的低维情况;(ii)当$s\ll n \ll d$时的高维情况,其中$s$表示显著预测值的数量。在高维情况下,我们仔细地描述了迭代加权$\ell_1$正则化退休估计的解决路径,该路径源于折叠凸正则化的局部线性近似算法。在轻微的最小信号强度条件下,我们证明经过$\log(\log d)$次迭代后最终的迭代估计符合与Oracle相同的收敛率。在每次迭代中,通过半光滑牛顿坐标下降算法可以有效地求解加权$\ell_1$正则化的凸优化问题。数值研究表明,与非鲁棒或基于分位数回归的替代方法相比,所提出的程序表现具有竞争力。