迅速引入对预训练蛋白模型的构象的注射

论文标题

迅速引入对预训练蛋白模型的构象的注射

Prompt-Guided Injection of Conformation to Pre-trained Protein Model

论文作者

Zhang, Qiang, Wang, Zeyuan, Han, Yuqiang, Yu, Haoran, Jin, Xurui, Chen, Huajun

论文摘要

预训练的蛋白质模型（PTPM）代表一种具有固定嵌入的蛋白质，因此无法执行各种任务。例如，蛋白质结构可以在各种生物过程中的几种构象之间转移，即蛋白质折叠。为了使PTPM能够产生任务感知表示形式，我们建议学习可解释，可插入和可扩展的蛋白质提示，以作为将与任务相关的知识注入PTPM的一种方式。在这方面，可以将蒙版语言建模任务的先前PTPM优化解释为学习序列提示（SEQ提示），该序列提示（SEQ提示）使PTPM可以捕获氨基酸之间的顺序依赖性。为了将构象知识纳入PTPM，我们提出了一个通过蛋白质 - 蛋白质相互作用任务的反向传播来学到的相互作用构造提示（IC提示）。作为实例化，我们提出了一种构型感知的预训练的蛋白质模型，该模型在多任务设置中学习序列和相互作用构造提示。我们对九种蛋白质数据集进行了全面的实验。结果证实了我们的期望，即使用序列提示不会损害PTPMS在序列相关任务上的性能，同时结合互动构造及时提示会显着提高PTPMS在构象知识计数的任务上的性能。我们还表明，可以将学习的提示组合并扩展以处理新的复杂任务。

Pre-trained protein models (PTPMs) represent a protein with one fixed embedding and thus are not capable for diverse tasks. For example, protein structures can shift, namely protein folding, between several conformations in various biological processes. To enable PTPMs to produce task-aware representations, we propose to learn interpretable, pluggable and extensible protein prompts as a way of injecting task-related knowledge into PTPMs. In this regard, prior PTPM optimization with the masked language modeling task can be interpreted as learning a sequence prompt (Seq prompt) that enables PTPMs to capture the sequential dependency between amino acids. To incorporate conformational knowledge to PTPMs, we propose an interaction-conformation prompt (IC prompt) that is learned through back-propagation with the protein-protein interaction task. As an instantiation, we present a conformation-aware pre-trained protein model that learns both sequence and interaction-conformation prompts in a multi-task setting. We conduct comprehensive experiments on nine protein datasets. Results confirm our expectation that using the sequence prompt does not hurt PTPMs' performance on sequence-related tasks while incorporating the interaction-conformation prompt significantly improves PTPMs' performance on tasks where conformational knowledge counts. We also show the learned prompts can be combined and extended to deal with new complex tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题