Pre-trained protein models (PTPMs) represent a protein with one fixed embedding and thus are not capable for diverse tasks. For example, protein structures can shift, namely protein folding, between several conformations in various biological processes. To enable PTPMs to produce task-aware representations, we propose to learn interpretable, pluggable and extensible protein prompts as a way of injecting task-related knowledge into PTPMs. In this regard, prior PTPM optimization with the masked language modeling task can be interpreted as learning a sequence prompt (Seq prompt) that enables PTPMs to capture the sequential dependency between amino acids. To incorporate conformational knowledge to PTPMs, we propose an interaction-conformation prompt (IC prompt) that is learned through back-propagation with the protein-protein interaction task. As an instantiation, we present a conformation-aware pre-trained protein model that learns both sequence and interaction-conformation prompts in a multi-task setting. We conduct comprehensive experiments on nine protein datasets. Results confirm our expectation that using the sequence prompt does not hurt PTPMs' performance on sequence-related tasks while incorporating the interaction-conformation prompt significantly improves PTPMs' performance on tasks where conformational knowledge counts. We also show the learned prompts can be combined and extended to deal with new complex tasks.
翻译:培训前蛋白模型(PTPM)代表一种具有固定嵌入功能的蛋白质,因此无法完成多种任务。例如,蛋白结构可以在各种生物过程中的多个匹配体之间转变,即蛋白折叠。为了使PTPMs能够产生有任务意识的演示,我们建议学习可解释、可插入和可扩展的蛋白质提示,以此将任务相关知识注入PTPM。在这方面,先前的PTPM优化与隐蔽语言模型任务相结合,可被解释为学习一个快速(Seq 快速)序列,使PTPMs能够捕捉氨酸之间的相继依赖性。为了将符合性知识纳入PTPMs,我们建议采用互动调节性能快速(IC 快速),通过对蛋白- pTPMs互动任务进行反向分析,我们提出一个经过事先培训的蛋白模型,既学习序列又在多功能环境下进行互动调节。我们在九种蛋白数据集上进行全面实验。结果证实,我们期望使用该序列的同步性调整过程不会损害PTPMs的进度,同时进行快速的学习。