Recent advances in machine learning have enabled accurate prediction of chemical properties. However, supervised machine learning methods in this domain often suffer from the label scarcity problem, due to the expensive nature of labeling chemical property experimentally. This research modifies state-of-the-art molecule generation method - Junction Tree Variational Autoencoder (JT-VAE) to facilitate semi-supervised learning on chemical property prediction. Furthermore, we force some latent variables to take on consistent and interpretable purposes such as representing toxicity via this partial supervision. We leverage JT-VAE architecture to learn an interpretable representation optimal for tasks ranging from molecule property prediction to conditional molecule generation, using a partially labelled dataset.
翻译:机器学习的最近进展使得能够准确预测化学特性。然而,由于化学特性标签的实验性昂贵,该领域受监督的机器学习方法往往受到标签稀缺问题的影响。这项研究改变了最先进的分子生成方法 — — 连接树变异自动编码器(JT-VAE),以便利在化学特性预测方面进行半监督性学习。此外,我们迫使一些潜在变量采用一致和可解释的目的,例如通过这种部分监督代表毒性。我们利用JT-VAE结构学习一种可解释的表述方式,以最优化的任务,从分子特性预测到有条件的分子生成,使用部分标签的数据集。