Penalized logistic regression is extremely useful for binary classification with a large number of covariates (significantly higher than the sample size), having several real life applications, including genomic disease classification. However, the existing methods based on the likelihood based loss function are sensitive to data contamination and other noise and, hence, robust methods are needed for stable and more accurate inference. In this paper, we propose a family of robust estimators for sparse logistic models utilizing the popular density power divergence based loss function and the general adaptively weighted LASSO penalties. We study the local robustness of the proposed estimators through its influence function and also derive its oracle properties and asymptotic distribution. With extensive empirical illustrations, we clearly demonstrate the significantly improved performance of our proposed estimators over the existing ones with particular gain in robustness. Our proposal is finally applied to analyse four different real datasets for cancer classification, obtaining robust and accurate models, that simultaneously performs gene selection and patient classification.
翻译:惩罚性物流回归对于二进制分类极为有用,因为二进制分类有许多共变种(大大高于抽样规模),有几种实际生命应用,包括基因组疾病分类;然而,基于可能性损失功能的现有方法对数据污染和其他噪音十分敏感,因此,需要有稳健的方法进行稳定和更准确的推论;在本文件中,我们提议利用流行密度差异损失功能和一般的适应性加权LASSO处罚,为稀释后勤模型建立一套稳健的估算器;我们通过其影响功能研究拟议估算器在当地的稳健性,并研究其外观特性和无药性分布;我们通过广泛的实证说明,清楚地表明了我们提议的估算器相对于现有测算器的性能显著改进,特别是稳健性。我们的建议最终用于分析癌症分类的四种不同的真实数据集,获得稳健和准确的模型,同时进行基因选择和病人分类。