Drug resistance is a major threat to the global health and a significant concern throughout the clinical treatment of diseases and drug development. The mutation in proteins that is related to drug binding is a common cause for adaptive drug resistance. Therefore, quantitative estimations of how mutations would affect the interaction between a drug and the target protein would be of vital significance for the drug development and the clinical practice. Computational methods that rely on molecular dynamics simulations, Rosetta protocols, as well as machine learning methods have been proven to be capable of predicting ligand affinity changes upon protein mutation. However, the severely limited sample size and heavy noise induced overfitting and generalization issues have impeded wide adoption of machine learning for studying drug resistance. In this paper, we propose a robust machine learning method, termed SPLDExtraTrees, which can accurately predict ligand binding affinity changes upon protein mutation and identify resistance-causing mutations. Especially, the proposed method ranks training data following a specific scheme that starts with easy-to-learn samples and gradually incorporates harder and diverse samples into the training, and then iterates between sample weight recalculations and model updates. In addition, we calculate additional physics-based structural features to provide the machine learning model with the valuable domain knowledge on proteins for this data-limited predictive tasks. The experiments substantiate the capability of the proposed method for predicting kinase inhibitor resistance under three scenarios, and achieves predictive accuracy comparable to that of molecular dynamics and Rosetta methods with much less computational costs.
翻译:药物抗药性是全球健康的一大威胁,也是整个疾病临床治疗和药物发展过程中一个重大关切问题。与药物结合有关的蛋白质突变是适应性抗药性的共同原因。因此,对药物与目标蛋白之间相互作用的突变如何影响药物与目标蛋白之间的相互作用进行定量估计,对于药物发展和临床实践至关重要。依赖分子动态模拟的计算方法、Rosetta规程以及机器学习方法已证明能够预测蛋白突变时的离心性和亲近性变化。然而,与药物结合有关的蛋白质的样本规模极为有限,且噪音过大,妨碍广泛采用机器学习研究抗药性的方法。在本论文中,我们提出了一种强有力的机器学习方法,称为SPLDExtraTrees,可以准确地预测蛋白突变的离心性变化,并查明耐药性突变。特别是,拟议方法将培训数据排序遵循一个具体的计划,从易于阅读的样本开始,并逐渐将更多样本纳入培训中,然后将模型用于研究耐药性重的精度的精确度和精确性计算方法。根据模型,根据模型计算模型计算,以模型计算,以基本的精确度计算,从而测量和精确地计算,以精确的精确度数据,根据模型计算方法进行。在精确度计算,以精确性方法更新。根据模型计算,以精确性方法计算,为三种方法计算,以精确性方法计算,以精确性方法更新。在模型方法计算,以精确性方法进行。在计算,以精确性方法进行。进行。