Recent evidence highlights the usefulness of DNA methylation (DNAm) biomarkers as surrogates for exposure to risk factors for non-communicable diseases in epidemiological studies and randomized trials. DNAm variability has been demonstrated to be tightly related to lifestyle behavior and exposure to environmental risk factors, ultimately providing an unbiased proxy of an individual state of health. At present, the creation of DNAm surrogates relies on univariate penalized regression models, with elastic-net regularizer being the gold standard when accomplishing the task. Nonetheless, more advanced modeling procedures are required in the presence of multivariate outcomes with a structured dependence pattern among the study samples. In this work we propose a general framework for mixed-effects multitask learning in presence of high-dimensional predictors to develop a multivariate DNAm biomarker from a multi-center study. A penalized estimation scheme based on an expectation-maximization algorithm is devised, in which any penalty criteria for fixed-effects models can be conveniently incorporated in the fitting process. We apply the proposed methodology to create novel DNAm surrogate biomarkers for multiple correlated risk factors for cardiovascular diseases and comorbidities. We show that the proposed approach, modeling multiple outcomes together, outperforms state-of-the-art alternatives, both in predictive power and bio-molecular interpretation of the results.
翻译:最近有证据表明,DNA甲基化(DNAM)生物标志物在流行病学研究和随机试验中作为非传染病风险因素暴露的代谢物的作用。DNAm变异性已经证明与生活方式行为和环境风险因素的接触密切相关,最终为个人健康状况提供了公正的代谢。目前,DNAm代孕物的创建依赖于受非静态约束的回归模型,而弹性网常规化器在完成任务时是黄金标准。然而,在研究样本中存在结构依赖模式的多变结果时,还需要采用更先进的模型程序。在这项工作中,我们提出了一个关于混合效应的多任务学习总框架,以便在高度预测器面前进行多功能的学习,以便从多中心研究中开发多变式DNA生物标志物。根据预期-峰化算法设计了一种受罚的估算计划,其中固定效应模型的任何惩罚标准都可以方便地纳入适当的过程。我们采用拟议的方法来创建新型DNA模型模型,在多种关联性预测性预测力、多心血管疾病和共振动结果的预测结果中,我们提出了一种预测。</s>