论文标题
通过共同进化策略来表示蛋白质多模式信息,增强化合物蛋白结合亲和力预测
Enhanced compound-protein binding affinity prediction by representing protein multimodal information via a coevolutionary strategy
论文作者
论文摘要
由于缺乏有效地表示蛋白质的多模式信息的方法,包括其结构和序列信息,因此在应用机器学习方法时,预测化合物 - 蛋白质结合亲和力(CPA)仍然患有低精度。为了克服这一局限性,在一种新颖的端到端体系结构(命名为Fartnn)中,我们制定了共同进化策略,以共同表示蛋白质的结构和序列特征,并最终优化用于预测CPA的数学模型。此外,从数据驱动的方法的角度来看,我们提出了一种有理方法,该方法可以利用高质量和低质量数据库来优化CPA预测任务中功能的准确性和概括能力。值得注意的是,我们以合理设计的体系结构中的序列和结构之间的视觉解释。结果,在虚拟药物筛查任务中,壮举的表现大大优于最先进的基线(SOTA)基线,这表明这种方法可用于实际使用。 Featnn通过通过协同进化策略有效地代表蛋白质的多模式信息,为更高的CPA预测准确性和更好的概括能力提供了出色的方法。
Due to the lack of a method to efficiently represent the multimodal information of a protein, including its structure and sequence information, predicting compound-protein binding affinity (CPA) still suffers from low accuracy when applying machine learning methods. To overcome this limitation, in a novel end-to-end architecture (named FeatNN), we develop a coevolutionary strategy to jointly represent the structure and sequence features of proteins and ultimately optimize the mathematical models for predicting CPA. Furthermore, from the perspective of data-driven approach, we proposed a rational method that can utilize both high- and low-quality databases to optimize the accuracy and generalization ability of FeatNN in CPA prediction tasks. Notably, we visually interpret the feature interaction process between sequence and structure in the rationally designed architecture. As a result, FeatNN considerably outperforms the state-of-the-art (SOTA) baseline in virtual drug screening tasks, indicating the feasibility of this approach for practical use. FeatNN provides an outstanding method for higher CPA prediction accuracy and better generalization ability by efficiently representing multimodal information of proteins via a coevolutionary strategy.