The rapid finding of effective therapeutics requires the efficient use of available resources in clinical trials. The use of covariate adjustment can yield statistical estimates with improved precision, resulting in a reduction in the number of participants required to draw futility or efficacy conclusions. We focus on time-to-event and ordinal outcomes. A key question for covariate adjustment in randomized studies is how to fit a model relating the outcome and the baseline covariates to maximize precision. We present a novel theoretical result establishing conditions for asymptotic normality of a variety of covariate-adjusted estimators that rely on machine learning (e.g., l1-regularization, Random Forests, XGBoost, and Multivariate Adaptive Regression Splines), under the assumption that outcome data is missing completely at random. We further present a consistent estimator of the asymptotic variance. Importantly, the conditions do not require the machine learning methods to converge to the true outcome distribution conditional on baseline variables, as long as they converge to some (possibly incorrect) limit. We conducted a simulation study to evaluate the performance of the aforementioned prediction methods in COVID-19 trials using longitudinal data from over 1,500 patients hospitalized with COVID-19 at Weill Cornell Medicine New York Presbyterian Hospital. We found that using l1-regularization led to estimators and corresponding hypothesis tests that control type 1 error and are more precise than an unadjusted estimator across all sample sizes tested. We also show that when covariates are not prognostic of the outcome, l1-regularization remains as precise as the unadjusted estimator, even at small sample sizes (n = 100). We give an R package adjrct that performs model-robust covariate adjustment for ordinal and time-to-event outcomes.
翻译:快速找到有效的治疗方法需要高效地利用临床试验中的现有资源。使用 Covoliate 调整后可更精确地得出统计估计,从而减少得出徒劳或功效结论所需的参与者数量。 我们侧重于时间到事件和交点结果。 在随机研究中进行共变调整的一个关键问题是,如何匹配一个与结果和基线变差有关的模型以最大限度地提高精确性能。 我们提出了一个新的理论结果,为各种依赖机器学习的 Covolidial 调整后估算器的无效果正常性创造了条件(例如, l1 常规周期、随机森林、 XGBoost 和多变调整后回归 Splines ), 假设结果完全随机缺失。 我们进一步展示了与结果和基线变差相关的模型的一致估算方法。 条件并不要求机器学习方法以基准变量为条件,只要它们与某些( 可能不正确) 共变) 限制值。 我们进行了一个模拟研究, 利用I- 19 DNA 的精确结果测试方法来评估1号医院的准确性结果。