Leveraging domain knowledge including fingerprints and functional groups in molecular representation learning is crucial for chemical property prediction and drug discovery. When modeling the relation between graph structure and molecular properties implicitly, existing works can hardly capture structural or property changes and complex structure, with much smaller atom vocabulary and highly frequent atoms. In this paper, we propose the Contrastive Knowledge-aware GNN (CKGNN) for self-supervised molecular representation learning to fuse domain knowledge into molecular graph representation. We explicitly encode domain knowledge via knowledge-aware molecular encoder under the contrastive learning framework, ensuring that the generated molecular embeddings equipped with chemical domain knowledge to distinguish molecules with similar chemical formula but dissimilar functions. Extensive experiments on 8 public datasets demonstrate the effectiveness of our model with a 6\% absolute improvement on average against strong competitors. Ablation study and further investigation also verify the best of both worlds: incorporation of chemical domain knowledge into self-supervised learning.
翻译:在分子代表性学习中利用包括指纹和功能组在内的域知识对化学财产预测和药物发现至关重要。在以图形结构与分子特性之间隐含的关系为模型时,现有作品很难捕捉结构或财产变化和复杂结构,其原子词汇和原子数量要小得多,原子词汇和原子频率也非常频繁。在本文件中,我们提议采用自监督分子代表性学习方法,将域知识结合到分子图示中。我们根据对比式学习框架,通过有知识的分子编码器明确将域知识编码起来,确保所生成的具有化学领域知识的分子嵌入器能够区分具有类似化学公式但功能不相似的分子。关于8个公共数据集的广泛实验显示了我们的模型的有效性,平均比强竞争者得到6 ⁇ 绝对的改进。进行实验和进一步调查还核实了两个世界的最佳之处:将化学领域知识纳入自监督学习。