Prediction of complete step-by-step chemical reaction mechanisms (CRMs) remains a major challenge. Whereas the traditional approaches in CRM tasks rely on expert-driven experiments or costly quantum chemical computations, contemporary deep learning (DL) alternatives ignore key intermediates and mechanistic steps and often suffer from hallucinations. We present DeepMech, an interpretable graph-based DL framework employing atom- and bond-level attention, guided by generalized templates of mechanistic operations (TMOps), to generate CRMs. Trained on our curated ReactMech dataset (~30K CRMs with 100K atom-mapped and mass-balanced elementary steps), DeepMech achieves 98.98+/-0.12% accuracy in predicting elementary steps and 95.94+/-0.21% in complete CRM tasks, besides maintaining high fidelity even in out-of-distribution scenarios as well as in predicting side and/or byproducts. Extension to multistep CRMs relevant to prebiotic chemistry, demonstrates the ability of DeepMech in effectively reconstructing 2 pathways from simple primordial substrates to complex biomolecules such as serine and aldopentose. Attention analysis identifies reactive atoms/bonds in line with chemical intuition, rendering our model interpretable and suitable for reaction design.
翻译:完整逐步化学反应机理(CRM)的预测仍然是一个重大挑战。传统的CRM任务方法依赖于专家驱动的实验或昂贵的量子化学计算,而当代深度学习(DL)替代方案则忽略了关键中间体和机理步骤,并常常产生幻觉。我们提出了DeepMech,一种可解释的基于图的深度学习框架,它采用原子级和键级注意力机制,并受机理操作通用模板(TMOps)的引导,以生成CRM。在我们精心整理的ReactMech数据集(约30K个CRM,包含100K个原子映射且质量守恒的基元步骤)上进行训练后,DeepMech在预测基元步骤方面达到了98.98±0.12%的准确率,在完整CRM任务中达到95.94±0.21%的准确率,并且即使在分布外场景以及预测副产物和/或副产品时也能保持高保真度。扩展到与生命起源前化学相关的多步CRM,证明了DeepMech能够有效地从简单的原始底物到复杂生物分子(如丝氨酸和醛戊糖)重建两条路径。注意力分析识别出与化学直觉一致的反应性原子/键,使我们的模型具有可解释性,并适用于反应设计。