论文标题

从最佳命中到最佳比赛

From Best Hits to Best Matches

论文作者

Stadler, Peter F., Geiß, Manuela, Schaller, David, Sánchez, Alitzel López, González, Marcos E., Valdivia, Dulce I., Hellmuth, Marc, Rosales, Maribel Hernández

论文摘要

许多常用的矫正方法从相互类似的基因对(相互最佳命中)开始,作为进化最紧密相关的基因对(相互最佳匹配)的近似值。对于超级差异,即在分子时钟假设下,最佳匹配的这种最佳匹配近似变得非常精确。但是,每当寄生虫基因之间存在较大的谱系率变化时,它就会失败。在实践中,这将高水平的噪声引入了输入数据中,以获得最佳命中的正直检测方法。 如果已知基因之间的添加距离,则可以通过考虑某些四重奏基因的进化最紧密相关的对,只要在每个四重奏中,相对于其余三个基因的外部群体都是已知的。 \ emph {先验}潜在物种系统发育的知识极大地促进了所需的外部群体的识别。尽管工作流程仍然是一种启发式,因为在所有情况下都不能可靠地确定正确的外部群体,但具有谱系特定偏见和速率不对称的模拟表明可以实现几乎完美的结果。在现实的环境中,必须从序列数据估算距离数据,因此嘈杂,仍然可以获得高度准确的最佳匹配集。 从这里报道的最佳匹配的准确推断以及对(倒数)最佳匹配图和矫正关系的理解,可以改善无树矫正评估方法的改进。

Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level of noise into the input data for best-hit-based orthology detection methods. If additive distances between genes are known, then evolutionary most closely related pairs can be identified by considering certain quartets of genes provided that in each quartet the outgroup relative to the remaining three genes is known. \emph{A priori} knowledge of underlying species phylogeny greatly facilitates the identification of the required outgroup. Although the workflow remains a heuristic since the correct outgroup cannot be determined reliably in all cases, simulations with lineage specific biases and rate asymmetries show that nearly perfect results can be achieved. In a realistic setting, where distances data have to be estimated from sequence data and hence are noisy, it is still possible to obtain highly accurate sets of best matches. Improvements of tree-free orthology assessment methods can be expected from a combination of the accurate inference of best matches reported here and recent mathematical advances in the understanding of (reciprocal) best match graphs and orthology relations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源