The creation of non invasive biomarkers from blood DNA methylation profiles is a cutting-edge achievement in personalized medicine: DNAm epimutations have been demonstrated to be tightly related to lifestyle and environmental risk factors, ultimately providing an unbiased proxy of an individual state of health. At present, the creation of DNAm surrogates relies on univariate penalized regression model, with elastic net being the standard way to-go when accomplishing the task. Nonetheless, more advanced modeling procedures are required when the response is multivariate in nature and the samples showcase a structured dependence pattern. In this work, with the aim of developing a multivariate DNAm biomarker from a multi-centric study, we propose a general framework for high-dimensional, mixed-effects multitask learning. A penalized estimation scheme based on an EM algorithm is devised, in which any penalty criteria for fixed-effects models can be conveniently incorporated in the fitting process. The methodology is then employed to create a novel surrogate of cardiovascular and high blood pressure comorbidities, showcasing better results, both in terms of predictive power and epidemiological interpretation, than state-of-the-art alternatives.
翻译:从血型DNA甲基化剖面图中创建非侵入性生物标志是个人医学的一个最尖端的成就:DNA突变已证明与生活方式和环境风险因素密切相关,最终提供了个人健康状况的公正代谢。目前,DNA代位体的创建依靠的是单象牙惩罚回归模型,弹性网是完成任务的标准方法。然而,当反应具有多变性质,样本显示出结构性依赖模式时,需要采用更先进的模型程序。在这项工作中,为了从多中心研究中开发多变DNA生物标志,我们提议了一个高维、混合效应多任务学习的总框架。根据EM算法制定了一个惩罚性估算计划,其中固定效果模型的任何惩罚标准都可以方便地纳入到适应过程中。然后,该方法用于创建一个新的心血管和高血压共振荡,在预测力和流行病学解释方面展示更好的结果,而不是州-艺术替代品。