论文标题
暴露健康研究的最先进方法:Exposome数据挑战事件的结果
State-of-the-Art Methods for Exposure-Health Studies: results from the Exposome Data Challenge Event
论文作者
论文摘要
该宣泄体认识到,个人同时暴露于多种不同的环境因素,并采用整体方法来发现疾病的病因学因素。但是,试图量化复杂暴露混合物的健康影响时会出现挑战。分析挑战包括处理高维度,研究这些暴露及其相互作用的综合效果,整合因果途径以及整合OMICS层。为了应对这些挑战,Isglobal Exposome Hub举行了一个数据挑战事件,向来自世界各地的研究人员和所有专业化开放。分析师有机会在常见的部分模拟的展览组数据集上竞争并应用最新方法(基于Helix项目的实际病例数据),该数据集具有多个相关的暴露变量(P> 100)(P> 100)(p> 100)是由一般和个人环境在不同时间点引起的,生物学分子数据(多组学:DNA甲基化,蛋白质),蛋白质,蛋白质,蛋白质,蛋白质,蛋白质,蛋白质,蛋白质,蛋白质,蛋白质,蛋白质,属于素质,蛋白质,属于素质,属于素质,蛋白质,蛋白质,蛋白质,蛋白质,蛋白质,属于protip antip,protip and。母子双子。介绍的大多数方法都包括功能选择或减少功能,以应对exposome数据集的高维度。几种方法明确搜索了使用线性索引模型或响应表面方法(包括贝叶斯方法)的暴露和/或其相互作用的组合效应。其他方法在调解分析中使用多步进方法处理了多摩变数据集。在这里,我们讨论统计模型并提供所使用的数据和代码,以便分析师具有实施示例,并可以学习如何使用这些方法。总体而言,外界数据挑战为来自不同学科的研究人员提供了独特的机会来创建和共享方法,为开放科学和环境健康领域的开放科学树立了新的标准。
The exposome recognizes that individuals are exposed simultaneously to a multitude of different environmental factors and takes a holistic approach to the discovery of etiological factors for disease. However, challenges arise when trying to quantify the health effects of complex exposure mixtures. Analytical challenges include dealing with high dimensionality, studying the combined effects of these exposures and their interactions, integrating causal pathways, and integrating omics layers. To tackle these challenges, ISGlobal Exposome Hub held a data challenge event open to researchers from all over the world and from all expertises. Analysts had a chance to compete and apply state-of-the-art methods on a common partially simulated exposome dataset (based on real case data from the HELIX project) with multiple correlated exposure variables (P>100) arising from general and personal environments at different time points, biological molecular data (multi-omics: DNA methylation, gene expression, proteins, metabolomics) and multiple clinical phenotypes in 1301 mother-child pairs. Most of the methods presented included feature selection or feature reduction to deal with the high dimensionality of the exposome dataset. Several approaches explicitly searched for combined effects of exposures and/or their interactions using linear index models or response surface methods, including Bayesian methods. Other methods dealt with the multi-omics dataset in mediation analyses using multiple-step approaches. Here we discuss the statistical models and provide the data and codes used, so that analysts have examples of implementation and can learn how to use these methods. Overall, the exposome data challenge presented a unique opportunity for researchers from different disciplines to create and share methods, setting a new standard for open science in the exposome and environmental health field.