This work demonstrates how mixed effects random forests enable accurate predictions of depression severity using multimodal physiological and digital activity data collected from an 8-week study involving 31 patients with major depressive disorder. We show that mixed effects random forests outperform standard random forests and personal average baselines when predicting clinical Hamilton Depression Rating Scale scores (HDRS_17). Compared to the latter baseline, accuracy is significantly improved for each patient by an average of 0.199-0.276 in terms of mean absolute error (p<0.05). This is noteworthy as these simple baselines frequently outperform machine learning methods in mental health prediction tasks. We suggest that this improved performance results from the ability of the mixed effects random forest to personalise model parameters to individuals in the dataset. However, we find that these improvements pertain exclusively to scenarios where labelled patient data are available to the model at training time. Investigating methods that improve accuracy when generalising to new patients is left as important future work.
翻译:这项工作表明,混合效应随机森林如何利用从一项为期8周的研究中收集的涉及31名严重抑郁症患者的多式生理和数字活动数据,能够准确预测抑郁症严重程度。我们显示,混合效应随机森林在预测临床汉密尔顿抑郁症降标比例分数时,优于标准随机森林和个人平均基线(HDRS-17)。与后一项基线相比,每个患者的准确度平均提高0.199-0.276,平均为0.199-0.276,平均绝对误差(p<0.05)。 这一点值得注意,因为这些简单基线往往优于心理健康预测工作中的机器学习方法。我们建议,这种改进的绩效是由于混合效应随机森林能够使数据集中的个人个人个人化模型参数。然而,我们发现,这些改进完全与模型在培训时间可获得贴标签的患者数据的情形有关。研究方法,在对新患者的概括作为重要未来工作留下时提高准确度的方法。