Linear mixed models (LMMs), which typically assume normality for both the random effects and error terms, are a popular class of methods for analyzing longitudinal and clustered data. However, such models can be sensitive to outliers, and this can lead to poor statistical results (e.g., biased inference on model parameters and inaccurate prediction of random effects) if the data are contaminated. We propose a new approach to robust estimation and inference for LMMs using a hierarchical gamma divergence, which offers an automated, data-driven approach to downweight the effects of outliers occurring in both the error, and the random effects, using normalized powered density weights. For estimation and inference, we develop a computationally scalable minorization-maximization algorithm for the resulting objective function, along with a clustered bootstrap method for uncertainty quantification and a Hyvarinen score criterion for selecting a tuning parameter controlling the degree of robustness. When the genuine and contamination mixed effects distributions are sufficiently separated, then under suitable regularity conditions assuming the number of clusters tends to infinity, we show the resulting robust estimates can be asymptotically controlled even under a heavy level of (covariate-dependent) contamination. Simulation studies demonstrate hierarchical gamma divergence consistently outperforms several currently available methods for robustifying LMMs, under a wide range of scenarios of outlier generation at both the response and random effects levels. We illustrate the proposed method using data from a multi-center AIDS cohort study, where the use of a robust LMMs using hierarchical gamma divergence approach produces noticeably different results compared to methods that do not adequately adjust for potential outlier contamination.
翻译:暂无翻译