论文标题

使用系统发育网络不变的统计学习

Statistical learning with phylogenetic network invariants

论文作者

Barton, Travis, Gross, Elizabeth, Long, Colby, Rusinko, Joseph

论文摘要

系统发育网络提供了一种描述被认为在其进化过程中经历杂交或基因流量的物种集的进化历史的方法。一组此类物种的突变过程可以建模为系统发育网络上的马尔可夫过程。先前的工作表明,来自Jukes-cantor系统发育网络模型的站点图案概率分布必须满足某些代数不变性。作为推论,从理论上可以从位点图案频率中识别系统发育网络的各个方面。实际上,由于序列进化的概率性质,即使对于模型下产生的数据,系统发育网络不变性也很少得到满足。因此,当观察到的位点图案频率被替换为不变性时,使用网络不变性来推断系统发育网络需要某种解释残差或零偏差的方法。在这项工作中,我们提出了一种利用不变残差和支持向量机的方法来推断4叶级的系统发育网络,从中可以从中重建较大的网络。给定的一组物种的数据,首先对支持向量机进行模型数据训练,以了解与不同网络结构相对应的残差模式,以对产生数据的网络进行分类。我们证明了我们在指定模型和灵长类动物数据中模拟数据上的方法的性能。

Phylogenetic networks provide a means of describing the evolutionary history of sets of species believed to have undergone hybridization or gene flow during their evolution. The mutation process for a set of such species can be modeled as a Markov process on a phylogenetic network. Previous work has shown that a site-pattern probability distributions from a Jukes-Cantor phylogenetic network model must satisfy certain algebraic invariants. As a corollary, aspects of the phylogenetic network are theoretically identifiable from site-pattern frequencies. In practice, because of the probabilistic nature of sequence evolution, the phylogenetic network invariants will rarely be satisfied, even for data generated under the model. Thus, using network invariants for inferring phylogenetic networks requires some means of interpreting the residuals, or deviations from zero, when observed site-pattern frequencies are substituted into the invariants. In this work, we propose a method of utilizing invariant residuals and support vector machines to infer 4-leaf level-one phylogenetic networks, from which larger networks can be reconstructed. Given data for a set of species, the support vector machine is first trained on model data to learn the patterns of residuals corresponding to different network structures to classify the network that produced the data. We demonstrate the performance of our method on simulated data from the specified model and primate data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源