Nanobodies are small antibody fragments derived from camelids that selectively bind to antigens. These proteins have marked physicochemical properties that support advanced therapeutics, including treatments for SARS-CoV-2. To realize their potential, bottom-up proteomics via liquid chromatography-tandem mass spectrometry (LC-MS/MS) has been proposed to identify antigen-specific nanobodies at the proteome scale, where a critical component of this pipeline is matching nanobody peptides to their begotten tandem mass spectra. While peptide-spectrum matching is a well-studied problem, we show the sequence similarity between nanobody peptides violates key assumptions necessary to infer nanobody peptide-spectrum matches (PSMs) with the standard target-decoy paradigm, and prove these violations beget inflated error rates. To address these issues, we then develop a novel framework and method that treats peptide-spectrum matching as a Bayesian model selection problem with an incomplete model space, which are, to our knowledge, the first to account for all sources of PSM error without relying on the aforementioned assumptions. In addition to illustrating our method's improved performance on simulated and real nanobody data, our work demonstrates how to leverage novel retention time and spectrum prediction tools to develop accurate and discriminating data-generating models, and, to our knowledge, provides the first rigorous description of MS/MS spectrum noise.
翻译:这些蛋白质具有支持先进治疗的物理化学特性,包括SARS-COV-2的治疗。为了实现它们的潜力,建议通过液相色谱-Tandem质量光谱测定(LC-MS/MS)来发现其自下而起的蛋白质组(PSMs),以在蛋白尺度上辨别抗原特有纳米体,这个管道的关键组成部分是将纳米体浸泡物与它们被生成的同步质谱相匹配。尽管peptide-spectrum匹配是一个研究周密的问题,但我们显示纳米体粒子与SARS-CO2的治疗方法之间的序列相似性违反了必要的关键假设,从而无法将纳米体peptide-spectrum匹配到标准的标度范(PSMs)中,并证明这些违规现象会引致高误差率。为了解决这些问题,我们随后开发了一个新的框架和方法,将peptide-pectrectrum 匹配成一个模型选择问题与不完全的模型空间,我们正在的精确的模型/SMSMSBL的精确度描述。 向我们所有的精确的精确的精确的模型的模型和精确的模型的模型的模型的模型的模型的模型, 向我们所有的模型的模型的模型的模型的模型的模型的模型的模型的模型, 展示到所有的模型的计算。