Modern drug discovery is often time-consuming, complex and cost-ineffective due to the large volume of molecular data and complicated molecular properties. Recently, machine learning algorithms have shown promising results in virtual screening of automated drug discovery by predicting molecular properties. While emerging learning methods such as graph neural networks and recurrent neural networks exhibit high accuracy, they are also notoriously computation-intensive and memory-intensive with operations such as feature embeddings or deep convolutions. In this paper, we propose a viable alternative to existing learning methods by presenting MoleHD, a method based on brain-inspired hyperdimensional computing (HDC) for molecular property prediction. We develop HDC encoders to project SMILES representation of a molecule into high-dimensional vectors that are used for HDC training and inference. We perform an extensive evaluation using 29 classification tasks from 3 widely-used molecule datasets (Clintox, BBBP, SIDER) under three splits methods (random, scaffold, and stratified). By an comprehensive comparison with 8 existing learning models including SOTA graph/recurrent neural networks, we show that MoleHD is able to achieve highest ROC-AUC score on random and scaffold splits on average across 3 datasets and achieve second-highest on stratified split. Importantly, MoleHD achieves such performance with significantly reduced computing cost and training efforts. To the best of our knowledge, this is the first HDC-based method for drug discovery. The promising results presented in this paper can potentially lead to a novel path in drug discovery research.
翻译:由于分子数据量大,分子特性复杂,现代药物发现往往耗时、复杂且成本低效,因为分子数据量大。最近,机器学习算法在通过预测分子特性对自动药物发现进行虚拟筛选方面显示出令人乐观的结果。虽然图形神经网络和经常神经网络等新兴学习方法显示高度精准,但它们也臭名昭著的计算密集和记忆密集,其操作包括特征嵌入或深层变异等功能。在本文中,我们提出一种可行的替代现有学习方法,即MoleHD,这是以大脑激发的超度计算(HDC)为基础的方法,用于分子财产预测。我们开发了HDC的发现编码,以预测分子特性特性特性特性特性特性特性特性的虚拟筛选为虚拟效果。虽然像图形神经网络和经常神经网络等新兴学习方法显示的分子在高维度矢量矢量载器中的表现,我们利用3个广泛使用的分子数据集(Clintox,BBBBP,SIDER)进行广泛的分类工作,根据三种分解方法(rand,Safold,Safold,calold, 和Sadal-deal-dealde rude),我们目前使用的药物计算方法可以与8个现有学习模型的模型进行全面比较。我们这个在ROC-caldeal-cal-caldealdeal 3级的计算,我们在RODDIDIDDDDMD 中取得高分解算法中取得最高分解算法。