Stereoselective reactions (both chemical and enzymatic reactions) have been essential for origin of life, evolution, human biology and medicine. Since late 1960s, there have been numerous successes in the exciting new frontier of asymmetric catalysis. However, most industrial and academic asymmetric catalysis nowadays do follow the trial-and-error model, since the energetic difference for success or failure in asymmetric catalysis is incredibly small. Our current understanding about stereoselective reactions is mostly qualitative that stereoselectivity arises from differences in steric effects and electronic effects in multiple competing mechanistic pathways. Quantitatively understanding and modulating the stereoselectivity of for a given chemical reaction still remains extremely difficult. As a proof of principle, we herein present a novel machine learning technique, which combines a LASSO model and two Random Forest model via two Gaussian Mixture models, for quantitatively predicting stereoselectivity of chemical reactions. Compared to the recent ground-breaking approach [1], our approach is able to capture interactions between features and exploit complex data distributions, which are important for predicting stereoselectivity. Experimental results on a recently published dataset demonstrate that our approach significantly outperform [1]. The insight obtained from our results provide a solid foundation for further exploration of other synthetically valuable yet mechanistically intriguing stereoselective reactions.
翻译:定型选择性反应(化学和酶反应)对于生命起源、进化、人类生物学和医学至关重要。自1960年代后期以来,在令人振奋的新非对称催化新前沿取得了许多成功。然而,目前大多数工业和学术不对称催化学的确遵循试验和性硬化模型,因为对非对称催化法成功或失败的强烈差异极小。我们目前对定型选择性反应的理解主要是定性的,即定式选择性产生于多种竞合机械化路径的立体效应和电子效应的差异。定量理解和调节特定化学反应的立体选择性仍然极为困难。作为原则的证明,我们在此展示了一种新型机器学习技术,它将LASSO模型和两个随机森林模型相结合,通过两个高氏混合模型对化学反应的立体选择性进行定量预测。与最近的破碎方法[1]相比,我们的方法能够捕捉到各种特征之间的相互作用,并利用复杂的数据分布,这对于预测定型选择性十分重要。最近出版的数据模型的实验结果展示了我们更有价值的合成的精确的探索基础。