This study assesses the efficiency of several popular machine learning approaches in the prediction of molecular binding affinity: CatBoost, Graph Attention Neural Network, and Bidirectional Encoder Representations from Transformers. The models were trained to predict binding affinities in terms of inhibition constants $K_i$ for pairs of proteins and small organic molecules. First two approaches use thoroughly selected physico-chemical features, while the third one is based on textual molecular representations - it is one of the first attempts to apply Transformer-based predictors for the binding affinity. We also discuss the visualization of attention layers within the Transformer approach in order to highlight the molecular sites responsible for interactions. All approaches are free from atomic spatial coordinates thus avoiding bias from known structures and being able to generalize for compounds with unknown conformations. The achieved accuracy for all suggested approaches prove their potential in high throughput screening.
翻译:本研究评估了几种流行的机器学习方法在预测分子结合性方面的效率:CatBoost、Gigpopotention Neal网络和来自变异体的双向编码器演示。这些模型经过培训,可以预测蛋白质和小型有机分子的抑制常数($_i美元)的结合性。首先,两种方法使用完全选定的物理化学特征,而第三个方法则以文本分子表示为基础——这是首次尝试对结合性应用基于变异器的预测器。我们还讨论了变异器方法内注意层的可视化,以突出负责相互作用的分子点。所有方法都不受原子空间坐标的干扰,从而避免了已知结构的偏差,能够对不明相近的化合物进行概括。所有建议方法的准确性都证明了其在高量筛选中的潜力。