使用不对称数据的平均回归平均估计值 (High-dimensional robust approximated M-estimators for mean regression with asymmetric data)

Asymmetry along with heteroscedasticity or contamination often occurs with the growth of data dimensionality. In ultra-high dimensional data analysis, such irregular settings are usually overlooked for both theoretical and computational convenience. In this paper, we establish a framework for estimation in high-dimensional regression models using Penalized Robust Approximated quadratic M-estimators (PRAM). This framework allows general settings such as random errors lack of symmetry and homogeneity, or the covariates are not sub-Gaussian. To reduce the possible bias caused by the data's irregularity in mean regression, PRAM adopts a loss function with a flexible robustness parameter growing with the sample size. Theoretically, we first show that, in the ultra-high dimension setting, PRAM estimators have local estimation consistency at the minimax rate enjoyed by the LS-Lasso. Then we show that PRAM with an appropriate non-convex penalty in fact agrees with the local oracle solution, and thus obtain its oracle property. Computationally, we demonstrate the performances of six PRAM estimators using three types of loss functions for approximation (Huber, Tukey's biweight and Cauchy loss) combined with two types of penalty functions (Lasso and MCP). Our simulation studies and real data analysis demonstrate satisfactory finite sample performances of the PRAM estimator under general irregular settings.

翻译：在超高度数据分析中,为了减少数据在平均回归中的不规则性造成的可能偏差,PRAM采用了一种损失函数,其弹性稳健度参数随着抽样规模的增长而增长。从理论上讲,我们首先在超高尺寸设置中,我们首先表明,在超高尺寸设置中,PRAM估计器具有当地对LS-Lasso所享受的微负速率的估计一致性。然后我们表明,带有适当非一致惩罚的PRAM实际上与当地或触地方案一致,从而获得其或最接近的属性。我们用六类标准货币的精确度分析(我们测试中,我们用六类标准性能的精确性能分析,我们用六类标准性能分析模型的精确性能,我们用三种标准性能分析模型的精确性能,我们用三种标准性能分析模型的精确性能,我们用两种标准性能模型来演示六类标准性能的精确性能。