论文标题

计算排名系统发育树之间最近的邻居互换距离

Computing nearest neighbour interchange distances between ranked phylogenetic trees

论文作者

Collienne, Lena, Gavryushkin, Alex

论文摘要

许多用于搜索叶片标记树空间的流行算法都是基于树木重排操作的。在任何此类操作下,问题都将减少为搜索顶点为树的图形,并且(无向)边缘由通过一个重新安排操作连接的树对(有时称为移动)给出。最受欢迎的是古典最近的邻居互换,子树修剪和再生,以及树的一分点和重新连接移动。但是,计算距离的问题在每个图中都是np-hard,这使得树的推理和比较算法在实践中具有挑战性。 尽管排名的系统发育树是癌症研究,免疫学和流行病学等应用中感兴趣的核心对象之一,但这些树木最短路径问题的计算复杂性数十年一直尚未解决。在本文中,我们通过确定复杂性取决于两种类型的树木重排(等级移动和边缘移动)之间的重量差来解决排名最接近的邻居互换操作,并从二次问题(这是该问题的最低复杂性)而变化,而NP-HARD是最高的。特别是,我们的结果提供了系统发育树的重排操作的第一个例子,最短路径及其距离可以有效地计算。具体而言,我们的算法缩放到有成千上万叶的树木(如果有效实施的,则可能数十万)。 我们还将排名树的图表中计算距离的问题与在不合格的树上的该问题的众所周知的版本联系起来,通过引入一个移动类型之间的重量差的参数。我们建议研究由该参数索引的最短路径问题的家族,计算复杂性从二次到NP- hard不等。

Many popular algorithms for searching the space of leaf-labelled trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is NP-hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although ranked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to NP-hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with thousands of leaves (and likely hundreds of thousands if implemented efficiently). We also connect the problem of computing distances in our graph of ranked trees with the well-known version of this problem on unranked trees by introducing a parameter for the weight difference between move types. We propose to study a family of shortest path problems indexed by this parameter with computational complexity varying from quadratic to NP-hard.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源