Computationally generating novel synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine-learning models beyond conventional pharmacophoric methods have shown promise in generating novel small molecule compounds, but require significant tuning for a specific protein target. Here, we introduce a method called selective iterative latent variable refinement (SILVR) for conditioning an existing diffusion-based equivariant generative model without retraining. The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits. We use the SARS-CoV-2 Main protease fragments from Diamond X-Chem that form part of the COVID Moonshot project as a reference dataset for conditioning the molecule generation. The SILVR rate controls the extent of conditioning and we show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments, meaning that the new molecules fit the binding site without knowledge of the protein. We can also merge up to 3 fragments into a new molecule without affecting the quality of molecules generated by the underlying generative model. Our method is generalizable to any protein target with known fragments and any diffusion-based model for molecule generation.
翻译:计算生成具有高亲和力和低毒性的新型可合成化合物是药物设计领域的一大挑战。传统药效学方法以外的机器学习模型展示了在生成新型小分子化合物方面的潜力,但需要进行对于特定蛋白质靶点的重大调整。在此,我们引入了一种称为选择性迭代潜变量细化(SILVR)的方法,用于调节现有的基于扩散的等变生成模型,而无需重新训练。该模型允许基于片段命中情况,生成符合蛋白结合位点的新型分子。我们使用了Diamond X-Chem中的SARS-CoV-2主蛋白酶片段,该片段是COVID Moonshot项目的一部分,作为调节分子生成的参考数据集。SILVR率控制调节程度,我们表明适度的SILVR率使得可能生成与原始片段相似形状的新型分子,这意味着新型分子可以适应蛋白结合位点而无需了解蛋白质。我们还可以将多达3个片段合并成一种新的分子,而不影响基础生成模型生成的分子质量。我们的方法可推广到任何具有已知片段的蛋白质靶点以及任何基于扩散的分子生成模型。