Graph neural networks (GNNs) have been proposed for a wide range of graph-related learning tasks. In particular, in recent years there has been an increasing number of GNN systems that were applied to predict molecular properties. However, in theory, there are infinite choices of hyperparameter settings for GNNs, and a direct impediment is to select appropriate hyperparameters to achieve satisfactory performance with lower computational cost. Meanwhile, the sizes of many molecular datasets are far smaller than many other datasets in typical deep learning applications, and most hyperparameter optimization (HPO) methods have not been explored in terms of their efficiencies on such small datasets in molecular domain. In this paper, we conducted a theoretical analysis of common and specific features for two state-of-the-art and popular algorithms for HPO: TPE and CMA-ES, and we compared them with random search (RS), which is used as a baseline. Experimental studies are carried out on several benchmarks in MoleculeNet, from different perspectives to investigate the impact of RS, TPE, and CMA-ES on HPO of GNNs for molecular property prediction. In our experiments, we concluded that RS, TPE, and CMA-ES have their individual advantages in tackling different specific molecular problems. Finally, we believe our work will motivate further research on GNN as applied to molecular machine learning problems in chemistry and materials sciences.
翻译:特别是,近年来,用于预测分子特性的GNN系统越来越多,但理论上,GNN的超参数设置有无限的选择,直接的障碍是选择适当的超参数,以较低的计算成本达到令人满意的性能。与此同时,许多分子数据集的规模远远小于典型的深层学习应用中的其他数据集,而且大多数超参数优化方法尚未就其在分子领域这种小数据集上的效率进行探讨。在本文中,我们对GNNN的两种最先进和受欢迎的算法(TPE和CMA-ES)的共同和具体特点进行了理论分析,并将它们与随机搜索(RS)进行比较,后者用作基线。从不同角度对MoleculeNet中的若干基准进行了实验研究,以调查RS、TPE和CMA对分子领域的这种小数据集的影响。 在本文中,我们对GESNPE的两种最先进和流行的算法进行了理论分析,我们最后认为,GSPNPE和CMA的单个分子特性研究将使我们的化学研究具有不同的研究优势。