Sampling useful three-dimensional molecular structures along with their most favorable conformations is a key challenge in drug discovery. Current state-of-the-art 3D de-novo design flow matching or diffusion-based models are limited to generating a single conformation. However, the conformational landscape of a molecule determines its observable properties and how tightly it is able to bind to a given protein target. By generating a representative set of low-energy conformers, we can more directly assess these properties and potentially improve the ability to generate molecules with desired thermodynamic observables. Towards this aim, we propose FlexiFlow, a novel architecture that extends flow-matching models, allowing for the joint sampling of molecules along with multiple conformations while preserving both equivariance and permutation invariance. We demonstrate the effectiveness of our approach on the QM9 and GEOM Drugs datasets, achieving state-of-the-art results in molecular generation tasks. Our results show that FlexiFlow can generate valid, unstrained, unique, and novel molecules with high fidelity to the training data distribution, while also capturing the conformational diversity of molecules. Moreover, we show that our model can generate conformational ensembles that provide similar coverage to state-of-the-art physics-based methods at a fraction of the inference time. Finally, FlexiFlow can be successfully transferred to the protein-conditioned ligand generation task, even when the dataset contains only static pockets without accompanying conformations.
翻译:在药物发现中,采样具有有利构象的三维分子结构是一个关键挑战。当前最先进的基于流匹配或扩散的三维从头设计模型仅限于生成单一构象。然而,分子的构象景观决定了其可观测性质及其与特定蛋白质靶点的结合能力。通过生成一组具有代表性的低能构象异构体,我们可以更直接地评估这些性质,并可能提高生成具有所需热力学可观测分子的能力。为此,我们提出了FlexiFlow,一种新颖的架构,它扩展了流匹配模型,允许在保持等变性和置换不变性的同时,联合采样分子及其多个构象。我们在QM9和GEOM Drugs数据集上验证了该方法的有效性,在分子生成任务中取得了最先进的结果。我们的结果表明,FlexiFlow能够生成有效、无应变、独特且新颖的分子,同时高度忠实于训练数据分布,并能捕捉分子的构象多样性。此外,我们证明该模型生成的构象系综能够提供与最先进的基于物理的方法相似的覆盖范围,而推理时间仅为后者的一小部分。最后,FlexiFlow可以成功迁移至蛋白质条件配体生成任务,即使数据集中仅包含静态结合口袋而无相应构象信息。