Molecular property prediction is essential in chemistry, especially for drug discovery applications. However, available molecular property data is often limited, encouraging the transfer of information from related data. Transfer learning has had a tremendous impact in fields like Computer Vision and Natural Language Processing signaling for its potential in molecular property prediction. We present a pre-training procedure for molecular representation learning using reaction data and use it to pre-train a SMILES Transformer. We fine-tune and evaluate the pre-trained model on 12 molecular property prediction tasks from MoleculeNet within physical chemistry, biophysics, and physiology and show a statistically significant positive effect on 5 of the 12 tasks compared to a non-pre-trained baseline model.
翻译:分子财产预测在化学中至关重要,特别是在药物发现应用方面,但是,现有的分子财产数据往往有限,鼓励从相关数据中转让信息; 转让学习在计算机视觉和自然语言处理等领域产生了巨大影响,表明其在分子财产预测方面的潜力; 我们提出了一个使用反应数据进行分子代表性学习的训练前程序,并用来对SMILES变异器进行训练前培训; 我们微粒财产预测网在物理化学、生物物理学和生理学方面的12项分子财产预测任务中,对12项任务中的5项任务比未经过训练的基线模型,进行了微粒和预先培训后模型的调整和评价。