High-dimensional regression and regression with a left-censored response are each well-studied topics. In spite of this, few methods have been proposed which deal with both of these complications simultaneously. The Tobit model -- long the standard method for censored regression in economics -- has not been adapted for high-dimensional regression at all. To fill this gap and bring up-to-date techniques from high-dimensional statistics to the field of high-dimensional left-censored regression, we propose several penalized Tobit models. We develop a fast algorithm which combines quadratic minimization with coordinate descent to compute the penalized Tobit solution path. Theoretically, we analyze the Tobit lasso and Tobit with a folded concave penalty, bounding the $\ell_2$ estimation loss for the former and proving that a local linear approximation estimator for the latter possesses the strong oracle property. Through an extensive simulation study, we find that our penalized Tobit models provide more accurate predictions and parameter estimates than other methods. We use a penalized Tobit model to analyze high-dimensional left-censored HIV viral load data from the AIDS Clinical Trials Group and identify potential drug resistance mutations in the HIV genome. Appendices contain intermediate theoretical results and technical proofs.
翻译:高维度回归和左截尾响应的回归分析是已被广泛研究的主题。 尽管如此,很少有方法被提出来同时处理这两种并发症。 Tobit模型 - 长期以来,在经济学中作为截尾回归标准方法 - 完全没有被适应于高维度回归分析。为了填补这个空白,并将高维度统计学中最新的技术带入高维度左截尾回归领域中,我们提出了几种惩罚Tobit模型。我们开发了一种快速算法,该算法结合了二次最小化和坐标下降来计算惩罚的Tobit解决方案路径。理论上,我们分析了Tobit的Lasso和具有折叠凹惩罚的Tobit,以证明前者的L2估计损失并证明后者的局部线性逼近估计具有强大的Oracle属性。通过广泛的仿真研究,我们发现我们的惩罚Tobit模型提供比其他方法更准确的预测和参数估计。我们使用惩罚Tobit模型分析了艾滋病临床试验组的高维度左截尾HIV病毒载量数据,并鉴定了HIV基因组中潜在的耐药突变。附录包含中间理论结果和技术证明。