Sparse linear regression methods for high-dimensional data commonly assume that residuals have constant variance, which can be violated in practice. For example, Aphasia Quotient (AQ) is a critical measure of language impairment and informs treatment decisions, but it is challenging to measure in stroke patients. It is of interest to use high-resolution T2 neuroimages of brain damage to predict AQ. However, sparse regression models show marked evidence of heteroscedastic error even after transformations are applied. This violation of the homoscedasticity assumption can lead to bias in estimated coefficients, prediction intervals (PI) with improper length, and increased type I errors. Bayesian heteroscedastic linear regression models relax the homoscedastic error assumption but can enforce restrictive prior assumptions on parameters, and many are computationally infeasible in the high-dimensional setting. This paper proposes estimating high-dimensional heteroscedastic linear regression models using a heteroscedastic partitioned empirical Bayes Expectation Conditional Maximization (H-PROBE) algorithm. H-PROBE is a computationally efficient maximum a posteriori estimation approach that requires minimal prior assumptions and can incorporate covariates hypothesized to impact heterogeneity. We apply this method by using high-dimensional neuroimages to predict and provide PIs for AQ that accurately quantify predictive uncertainty. Our analysis demonstrates that H-PROBE can provide narrower PI widths than standard methods without sacrificing coverage. Narrower PIs are clinically important for determining the risk of moderate to severe aphasia. Additionally, through extensive simulation studies, we exhibit that H-PROBE results in superior prediction, variable selection, and predictive inference compared to alternative methods.
翻译:暂无翻译