Artificial intelligence has deeply revolutionized the field of medicinal chemistry with many impressive applications, but the success of these applications requires a massive amount of training samples with high-quality annotations, which seriously limits the wide usage of data-driven methods. In this paper, we focus on the reaction yield prediction problem, which assists chemists in selecting high-yield reactions in a new chemical space only with a few experimental trials. To attack this challenge, we first put forth MetaRF, an attention-based differentiable random forest model specially designed for the few-shot yield prediction, where the attention weight of a random forest is automatically optimized by the meta-learning framework and can be quickly adapted to predict the performance of new reagents while given a few additional samples. To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method to determine valuable samples to be experimentally tested and then learned. Our methodology is evaluated on three different datasets and acquires satisfactory performance on few-shot prediction. In high-throughput experimentation (HTE) datasets, the average yield of our methodology's top 10 high-yield reactions is relatively close to the results of ideal yield selection.
翻译:人工智能使医学化学领域发生了深刻的革命,有许多令人印象深刻的应用,但是,这些应用的成功需要大量的培训样本,具有高质量的说明,这严重限制了数据驱动方法的广泛使用。在本文中,我们侧重于反应产生预测问题,它协助化学家在新的化学空间选择高产品反应,但只是经过一些实验试验。为了应对这一挑战,我们首先提出了MetaRF,这是专门为微粒产量预测设计的、基于关注的不同随机森林模型,其中随机森林的注意量由元学习框架自动优化,并且可以迅速调整,以预测新试剂的性能,同时再提供几个样本。为了改进微小的学习性能,我们进一步采用了基于尺寸的抽样方法,以确定有价值的样品,然后进行实验测试和学习。我们的方法根据三个不同的数据集进行评估,在微粒的预测中取得令人满意的性能。在高通量实验中,我们方法前10位高当量反应的平均产量与理想结果的选择相对接近。