Structural fingerprints and pharmacophore modeling are methodologies that have been used for at least two decades in various fields of cheminformatics: from similarity searching to machine learning (ML). Advances in silico techniques consequently led to combining both these methodologies into a new approach known as pharmacophore fingerprint. Herein, we propose a high-resolution, pharmacophore fingerprint called Pharmacoprint that encodes the presence, types, and relationships between pharmacophore features of a molecule. Pharmacoprint was evaluated in classification experiments by using ML algorithms (logistic regression, support vector machines, linear support vector machines, and neural networks) and outperformed other popular molecular fingerprints (i.e., Estate, MACCS, PubChem, Substructure, Klekotha-Roth, CDK, Extended, and GraphOnly) and ChemAxon Pharmacophoric Features fingerprint. Pharmacoprint consisted of 39973 bits; several methods were applied for dimensionality reduction, and the best algorithm not only reduced the length of bit string but also improved the efficiency of ML tests. Further optimization allowed us to define the best parameter settings for using Pharmacoprint in discrimination tests and for maximizing statistical parameters. Finally, Pharmacoprint generated for 3D structures with defined hydrogens as input data was applied to neural networks with a supervised autoencoder for selecting the most important bits and allowed to maximize Matthews Correlation Coefficient up to 0.962. The results show the potential of Pharmacoprint as a new, perspective tool for computer-aided drug design.
翻译:结构指纹和药用植物建模是至少20年来在化学学各领域采用的方法:从相似性搜索到机器学习(ML)。硅技术的进步导致这两种方法合并为一种称作药用磷指纹的新方法。在这里,我们提议一种高分辨率、药用磷指纹,名为Pharmacophore指纹,该指纹将分子的药用植物特征、类型和关系编码起来。在分类实验中,通过使用ML算法(逻辑回归、支持矢量机器、线性支持矢量机器和神经网络)和优于其他受欢迎的分子指纹(即,地产、MACCS、PubChem、地下结构、Klekatoth-Roth、CDK、扩展和GreaphoOnly)来评价药用药用药用药用药用药用药用药用药用效率评估方法进行了分数下降, 并且将最佳的计算方法应用于 精度的数学结构, 也允许将最佳的DNA设计 优化到 数据测试 。