High-dimensional regression and regression with a left-censored response are each well-studied topics. In spite of this, few methods have been proposed which deal with both of these complications simultaneously. The Tobit model -- long the standard method for censored regression in economics -- has not been adapted for high-dimensional regression at all. To fill this gap and bring up-to-date techniques from high-dimensional statistics to the field of high-dimensional left-censored regression, we propose several penalized Tobit models. We develop a fast algorithm which combines quadratic minimization with coordinate descent to compute the penalized Tobit solution path. Theoretically, we analyze the Tobit lasso and Tobit with a folded concave penalty, bounding the $\ell_2$ estimation loss for the former and proving that a local linear approximation estimator for the latter possesses the strong oracle property. Through an extensive simulation study, we find that our penalized Tobit models provide more accurate predictions and parameter estimates than other methods. We use a penalized Tobit model to analyze high-dimensional left-censored HIV viral load data from the AIDS Clinical Trials Group and identify potential drug resistance mutations in the HIV genome. Appendices contain intermediate theoretical results and technical proofs.
翻译:高维回归和回归,加上左上层反应,都是研究周密的题目。尽管如此,很少提出同时处理这两种并发症的方法。 Tobit 模型 -- -- 长期的经济学审查回归的标准方法 -- -- 根本没有适应高维回归。为了填补这一空白,将高维统计的最新技术从高维的左下层回归中引入到高维左下层回归领域,我们提出了几种受罚的 Tobit 模型。我们开发了一种快速算法,将二次最小化最小化与协调下降结合起来,以计算受罚的 Tobit 溶液路径。理论上,我们分析了Tobit laso 和 Tobit 与折叠的折叠式折叠式折叠式折叠式折曲法,将前者的 $ell_2美元估算损失捆绑起来,并证明后者的本地直线性近度估计仪具有强或极强的属性。我们通过广泛的模拟研究发现,我们受罚的 Tobit 模型提供了比其他方法更准确的预测和参数估计。我们使用一种受罚的比特模型来分析高度左位的左位偏偏移病毒病毒病毒病毒病毒的模型,我们使用了一种分析高位模型,从艾滋病临床试验组的模拟试验和病毒的理论测试结果。