对多种估算引起的变异进行核算,以进行质量光谱法的差别分析,采用无标签无标签的量化蛋白质组学 (Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics)

from arxiv, The methodology here described is implemented under the R environment and can be found on GitHub: https://github.com/mariechion/mi4p. The R scripts which led to the results presented here can also be found on this repository. The real datasets are available on ProteomeXchange under the dataset identifiers PXD003841 and PXD027800

Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters' variability thanks to Rubin's rules. The imputation-based peptide's intensities' variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. For protein-level results based on peptide-level quantification data, an aggregation step is also included. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.

翻译：无标签量化蛋白质组学中常见的常见做法是错算缺失值。误算的目的是用用户定义的数值取代缺失值。然而, 估算本身可能不是最理想地被视为估算过程的下游, 因为估算数据集往往被视为始终是完整的。因此, 估算数据集的不确定性没有得到充分考虑。我们提供了一个严格的多重估算策略, 导致根据Rubin 的规则对参数的变异性进行偏差性估计。以估算为基础的精度偏差值偏差, 然后使用Bayesian 的等级模型来调节。这个估计本身可能不是最佳地被视为估算过程的下游, 因为估算数据集通常被视为是完整的。因此, 估算数据集的不确定性没有被充分考虑进去。对于基于精度定量数据的蛋白度水平的蛋白等级, 也包含一个汇总步骤。我们的方法, 命名为 MI4p, 和基于最新水平的精度的精度偏差度偏差度的偏差度的偏差度估测测测算器, 然后使用Bayes 模型来调节 Bayes 。这个估测算器最终被包括在调制成的测试数据格式中, 我们在已观察到的DARPARSDS 中, 的模拟的精确度中, 。