论文标题

在酵母中重建因果基因网络的仪器变量和基于中介的方法之间的比较

Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast

论文作者

Ludl, Adriaan-Alexander, Michoel, Tom

论文摘要

因果基因网络对细胞内的信息流进行建模,但是从OMICS数据重建它们是具有挑战性的,因为相关并不意味着因果关系。结合了分离群体的基因组学和转录组学数据,可以使用基因组变体来确定基因表达性状之间的因果关系的方向。仪器变量方法(IV)使用局部表达定量性状基因座(EQTL)作为基因表达水平的随机仪器,并基于远端EQTL关联分配目标基因。基于中介的方法(ME)还要求远端EQTL关联是由源基因介导的。在这里,我们使用了Findr,该软件是提供IV,ME和基于共表达的方法的均匀实现的软件,最新的数据集与两个发芽的酵母菌菌株之间的十字架的1,012个隔离剂以及已知转录相互作用的Yeastract数据库以比较因果基因网络网络的选择方法。我们发现因果推断方法导致与地面真相的重叠显着重叠,而共表达并不比随机性更好。亚采样分析表明,由于剩余相关性显着时,ME的性能在大样本量下会降低。由于eqtl仪器之间的基因组联系,IV方法包含假阳性预测。 IV和ME方法还具有识别转录热点为基础的因果基因的互补作用。 IV方法正确地预测了以转录因子STB5为中心的热点的STB5目标,而由于STB5P自动调节其自身表达,我失败了。我建议在CHR XII上的热点新候选基因DNM1,其中IV方法无法区分位于热点中的多个基因。

Causal gene networks model the flow of information within a cell, but reconstructing them from omics data is challenging because correlation does not imply causation. Combining genomics and transcriptomics data from a segregating population allows to orient the direction of causality between gene expression traits using genomic variants. Instrumental-variable methods (IV) use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods (ME) additionally require that distal eQTL associations are mediated by the source gene. Here we used Findr, a software providing uniform implementations of IV, ME, and coexpression-based methods, a recent dataset of 1,012 segregants from a cross between two budding yeast strains, and the YEASTRACT database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of ME decreases at large sample sizes, due to a loss of sensitivity when residual correlations become significant. IV methods contain false positive predictions, due to genomic linkage between eQTL instruments. IV and ME methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. IV methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas ME failed due to Stb5p auto-regulating its own expression. ME suggests a new candidate gene, DNM1, for a hotspot on Chr XII, where IV methods could not distinguish between multiple genes located within the hotspot.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源