The exposome recognizes that individuals are exposed simultaneously to a multitude of different environmental factors and takes a holistic approach to the discovery of etiological factors for disease. However, challenges arise when trying to quantify the health effects of complex exposure mixtures. Analytical challenges include dealing with high dimensionality, studying the combined effects of these exposures and their interactions, integrating causal pathways, and integrating omics layers. To tackle these challenges, ISGlobal Exposome Hub held a data challenge event open to researchers from all over the world and from all expertises. Analysts had a chance to compete and apply state-of-the-art methods on a common partially simulated exposome dataset (based on real case data from the HELIX project) with multiple correlated exposure variables (P>100) arising from general and personal environments at different time points, biological molecular data (multi-omics: DNA methylation, gene expression, proteins, metabolomics) and multiple clinical phenotypes in 1301 mother-child pairs. Most of the methods presented included feature selection or feature reduction to deal with the high dimensionality of the exposome dataset. Several approaches explicitly searched for combined effects of exposures and/or their interactions using linear index models or response surface methods, including Bayesian methods. Other methods dealt with the multi-omics dataset in mediation analyses using multiple-step approaches. Here we discuss the statistical models and provide the data and codes used, so that analysts have examples of implementation and can learn how to use these methods. Overall, the exposome data challenge presented a unique opportunity for researchers from different disciplines to create and share methods, setting a new standard for open science in the exposome and environmental health field.
翻译:分析挑战包括:应对高维度、研究这些暴露及其相互作用的综合影响、整合因果途径以及整合显微层;为应对这些挑战,ISGlobal 显微枢纽举办了一个数据挑战活动,世界各地研究人员和所有专家都可以参加;分析员有机会竞争和应用最先进的方法,以共同的、部分模拟的外观数据集(基于HELIX项目真实案例数据)为主的外观数据集(P>100),该数据集涉及不同时间点的一般和个人环境、生物分子数据(多缩影:DNA甲基化、基因表达、蛋白质、代言)和多个临床型数据,1301对母子对。分析员有机会竞争和应用最先进的外观方法(基于HELIX项目真实案例数据数据),在多关联性风险变量数据集(P>100)方面出现挑战。