论文标题
使用PQ-Trees近似搜索新基因组中的已知基因簇
Approximate Search for Known Gene Clusters in New Genomes Using PQ-Trees
论文作者
论文摘要
我们在比较基因组学(表示为pq-Tree搜索)中定义了一个新问题,该搜索是输入pq-tree $ t $,代表了一个基因兴趣基因群的已知基因顺序,基因到基因替代得分$ h $ $ h $,integer参数$ d_t $ d_t $和$ d_s $和$ d_s $,以及新的基因$ $ $ $ $。目的是在$ s $中识别基因簇的新实例,这些新实例可能与基因组重排的已知基因顺序不同,这些基因组重排受到$ t $约束的基因替代基因替代基因,这些基因替代基因由$ h $控制,而基因替代基因和基因的删除和插入的基因替代,这些替代基于上面的$ d_t $和$ d_t $和$ d_t $和$ d_s $。我们证明,PQ-Tree搜索问题是NP硬化,并提出了一种参数化算法,该算法求解了$ o^*(2^γ)$ time中PQ-Tree搜索的优化变体,其中$γ$是$ t $和$ o^*$的最大节点的程度,用于隐藏输入尺寸中的polynomial。该算法被用作搜索工具,表示为pqfinder,并应用于在质粒中搜索质体基因簇的实例,在1,487个原核基因组的数据集中。我们报告了29个在质粒中重新排列的染色体基因簇,其中重排在相应的PQ-Tree引导。这些结果之一是为重金属外排泵编码,进一步分析了如何利用PQFinder来揭示知名基因簇的有趣新结构变体。该工具的代码以及重建结果所需的所有数据都可以在GitHub上公开可用(github.com/galiazim/pqfinder)。
We define a new problem in comparative genomics, denoted PQ-Tree Search, that takes as input a PQ-tree $T$ representing the known gene orders of a gene cluster of interest, a gene-to-gene substitution scoring function $h$, integer parameters $d_T$ and $d_S$, and a new genome $S$. The objective is to identify in $S$ approximate new instances of the gene cluster that could vary from the known gene orders by genome rearrangements that are constrained by $T$, by gene substitutions that are governed by $h$, and by gene deletions and insertions that are bounded from above by $d_T$ and $d_S$, respectively. We prove that the PQ-Tree Search problem is NP-hard and propose a parameterized algorithm that solves the optimization variant of PQ-Tree Search in $O^*(2^γ)$ time, where $γ$ is the maximum degree of a node in $T$ and $O^*$ is used to hide factors polynomial in the input size. The algorithm is implemented as a search tool, denoted PQFinder, and applied to search for instances of chromosomal gene clusters in plasmids, within a dataset of 1,487 prokaryotic genomes. We report on 29 chromosomal gene clusters that are rearranged in plasmids, where the rearrangements are guided by the corresponding PQ-tree. One of these results, coding for a heavy metal efflux pump, is further analysed to exemplify how PQFinder can be harnessed to reveal interesting new structural variants of known gene clusters. The code for the tool as well as all the data needed to reconstruct the results are publicly available on GitHub (github.com/GaliaZim/PQFinder).