Traditionally, spline or kernel approaches in combination with parametric estimation are used to infer the linear coefficient (fixed effects) in a partially linear mixed-effects model (PLMM) for repeated measurements. Using machine learning algorithms allows us to incorporate more complex interaction structures and high-dimensional variables. We employ double machine learning to cope with the nonparametric part of the PLMM: the nonlinear variables are regressed out nonparametrically from both the linear variables and the response. This adjustment can be performed with any machine learning algorithm, for instance random forests. The adjusted variables satisfy a linear mixed-effects model, where the linear coefficient can be estimated with standard linear mixed-effects techniques. We prove that the estimated fixed effects coefficient converges at the parametric rate and is asymptotically Gaussian distributed and semiparametrically efficient. Empirical examples demonstrate our proposed algorithm. We present two simulation studies and analyze a dataset with repeated CD4 cell counts from HIV patients. Software code for our method is available in the R-package dmlalg.
翻译:使用机器学习算法,我们用双机学习来应付PLMM的非对称部分:非线性变数从线性变数和反应中以非对称方式反退。这种调整可以用任何机器学习算法来进行,例如随机森林。调整的变数符合线性混合效应模型,在此模型中线性混合效应系数可以用标准的线性混合效应技术来估计。我们证明,估计的固定效应系数在参数速率上趋近,并且是分散的和半对称效率的。我们提出两个模拟研究,并用艾滋病毒病人重复的 CD4 细胞计数分析数据集。我们的方法的软件代码见R-pagage dmlalg。