Despite the prevalence and many successes of deep learning applications in de novo molecular design, the problem of peptide generation targeting specific proteins remains unsolved. A main barrier for this is the scarcity of the high-quality training data. To tackle the issue, we propose a novel machine learning based peptide design architecture, called Latent Space Approximate Trajectory Collector (LSATC). It consists of a series of samplers on an optimization trajectory on a highly non-convex energy landscape that approximates the distributions of peptides with desired properties in a latent space. The process involves little human intervention and can be implemented in an end-to-end manner. We demonstrate the model by the design of peptide extensions targeting Beta-catenin, a key nuclear effector protein involved in canonical Wnt signalling. When compared with a random sampler, LSATC can sample peptides with $36\%$ lower binding scores in a $16$ times smaller interquartile range (IQR) and $284\%$ less hydrophobicity with a $1.4$ times smaller IQR. LSATC also largely outperforms other common generative models. Finally, we utilized a clustering algorithm to select 4 peptides from the 100 LSATC designed peptides for experimental validation. The result confirms that all the four peptides extended by LSATC show improved Beta-catenin binding by at least $20.0\%$, and two of the peptides show a $3$ fold increase in binding affinity as compared to the base peptide.
翻译:尽管在分子设计新分子设计中的深度学习应用十分普遍而且取得了许多成功,但是针对特定蛋白质的浸化物生成问题仍未解决。 造成这一问题的一个主要障碍是缺乏高质量的培训数据。 为了解决这个问题,我们提议建立一个新型机器学习基于浸化物的设计结构,称为“冷冻空间近似轨迹采集器(LSATC ) 。它由一系列在高度非凝固的能源环境优化轨迹上的样本组成,该轨迹接近于在潜藏空间中具有预期特性的浸化物的分布。这一过程涉及很少的人类干预,并且可以以端到端的方式实施。我们通过设计基于浸化物扩展的模型,以浸化物浸渍物设计成基于浸泡物设计设计结构,称为“冷冻物”轨迹收集器(LSATC ) 与随机采样器相比,LSATC 可以用比低1,600美元低的绑定分,比重为16美元(IQR), 低284美元, 水分为低的液态恐惧度,比比IQSATC 最小1.4倍。我们用了一个普通的磁质模型,最后用LSATC 模型展示了LSATC 。