There has been a surge of interest in developing robust estimators for models with heavy-tailed data in statistics and machine learning. This paper proposes a log-truncated M-estimator for a large family of statistical regressions and establishes its excess risk bound under the condition that the data have $(1+\varepsilon)$-th moment with $\varepsilon \in (0,1]$. With an additional assumption on the associated risk function, we obtain an $\ell_2$-error bound for the estimation. Our theorems are applied to establish robust M-estimators for concrete regressions. Besides convex regressions such as quantile regression and generalized linear models, many non-convex regressions can also be fit into our theorems, we focus on robust deep neural network regressions, which can be solved by the stochastic gradient descent algorithms. Simulations and real data analysis demonstrate the superiority of log-truncated estimations over standard estimations.
翻译:在统计和机器学习中大量数据模型的稳健估计值方面,人们的兴趣激增。本文件提议为一大批统计回归的大家庭建立一个对齐的M-估计值,并确立其超重风险,条件是数据以美元(1 ⁇ varepsilon)为单位,以美元(0,1美元)为单位,每秒1美元为单位。在相关风险函数的附加假设下,我们获得了一个以美元/ell_2美元为单位的高级估计值,用于估算。我们的理论用于为具体回归建立稳健的M-估计值。除了等量回归和通用线性模型等孔式回归外,许多非电流回归也可以与我们的理论相适应,我们侧重于强力的深神经网络回归值,这可以通过随机梯度梯度下位算法加以解决。模拟和真实数据分析表明日志估算值高于标准估算值的优势。