Predicting the binding of viral peptides to the major histocompatibility complex with machine learning can potentially extend the computational immunology toolkit for vaccine development, and serve as a key component in the fight against a pandemic. In this work, we adapt and extend USMPep, a recently proposed, conceptually simple prediction algorithm based on recurrent neural networks. Most notably, we combine regressors (binding affinity data) and classifiers (mass spectrometry data) from qualitatively different data sources to obtain a more comprehensive prediction tool. We evaluate the performance on a recently released SARS-CoV-2 dataset with binding stability measurements. USMPep not only sets new benchmarks on selected single alleles, but consistently turns out to be among the best-performing methods or, for some metrics, to be even the overall best-performing method for this task.
翻译:预测病毒浸泡物与主要与机器学习相兼容的综合体的结合,有可能扩大疫苗研制的计算免疫工具包,并成为防治大流行病的关键组成部分。在这项工作中,我们调整并推广最近提出的基于经常性神经网络的、概念上简单的预测算法USMPep。最显著的是,我们把质量上不同数据源的回归者(约束性亲近数据)和分类者(质谱测量数据)结合起来,以获得更全面的预测工具。我们评估了最近推出的具有约束性稳定性测量的SARS-CoV-2数据集的性能。 USMPep不仅为选定的单方言设定了新基准,而且始终被证明是最佳方法之一,或者对某些指标来说,甚至成为这项任务的总体最佳方法。