For some special data in reality, such as the genetic data, adjacent genes may have the similar function. Thus ensuring the smoothness between adjacent genes is highly necessary. But, in this case, the standard lasso penalty just doesn't seem appropriate anymore. On the other hand, in high-dimensional statistics, some datasets are easily contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address both issues, in this paper, we propose an adaptive Huber regression for robust estimation and inference, in which, the fused lasso penalty is used to encourage the sparsity of the coefficients as well as the sparsity of their differences, i.e., local constancy of the coefficient profile. Theoretically, we establish its nonasymptotic estimation error bounds under $\ell_2$-norm in high-dimensional setting. The proposed estimation method is formulated as a convex, nonsmooth and separable optimization problem, hence, the alternating direction method of multipliers can be employed. In the end, we perform on simulation studies and real cancer data studies, which illustrate that the proposed estimation method is more robust and predictive.
翻译:对于某些现实的特殊数据,例如基因数据,相邻基因可能具有类似的功能。因此,确保相邻基因之间的顺畅是非常必要的。 但在此情况下,标准拉索惩罚似乎不再合适。 另一方面,在高维统计中,一些数据集很容易被外部线污染,或含有大量尾细分布的变量,这使得许多传统方法不力。为了解决这两个问题,我们在本文件中提议采用适应性重力回归法,以进行稳健的估计和推断,从而使用结合的拉索惩罚来鼓励系数的宽度及其差异的宽度,即系数剖析的局部耐性。理论上,我们将其非随机估计错误确定在高维环境的 $@ell_2$- norm 之下。为了解决这两个问题,我们建议采用一个调和、非显微和可分化的优化问题,因此,采用结合的拉索惩罚法来鼓励这些系数的宽度及其差异的宽度,即系数的宽度。也就是说,我们进行更稳健的模拟研究和真实的数据研究,以模拟和真实的癌症研究为模型。