Large-scale hypothesis testing has become a ubiquitous problem in high-dimensional statistical inference, with broad applications in various scienfitic disciplines. One relevant application is constituted by imaging mass spectrometry (IMS) association studies, where a large number of tests are performed simultaneously in order to identify molecular masses that are associated with a particular phenotype, e. g., a cancer subtype. Mass spectra obtained from Matrix-assisted laser desorption/ionization (MALDI) experiments are dependent, when considered as statistical quantities. False discovery proportion (FDP) control under arbitrary dependency structure among test statistics is an active topic in modern multiple testing research. In this context, we are concerned with the evaluation of associations between the binary outcome variable (describing the phenotype) and multiple predictors derived from MALDI measurements. We propose an inference procedure in which the correlation matrix of the test statistics is utilized. The approach is based on multiple marginal models (MMM). Specifically, we fit a marginal logistic regression model for each predictor individually. Asymptotic joint normality of the stacked vector of the marginal regression coefficients is established under standard regularity assumptions, and their (limiting) correlation matrix is estimated. The proposed method extracts common factors from the resulting empirical correlation matrix. Finally, we estimate the realized FDP of a thresholding procedure for the marginal $p$-values. We demonstrate a practical application of the proposed workflow to MALDI IMS data in an oncological context.
翻译:大规模假设测试已成为高维统计中一个普遍存在的问题,在各种测序学科中广泛应用了各种测序学学科,其中一项相关应用是由成像质质谱测量(IMS)关联研究构成的,在这种研究中,大量测试同时进行,以便确定与特定苯型(例如癌症亚型)相关的分子质量。从矩阵辅助激光解吸/离(MALDI)实验中获得的大规模光谱在被视为统计数量时取决于多边际模型(MMMMD$)。在测试统计中任意依赖结构下的误发现比例(FDP)控制是现代多重测试研究的一个积极话题。在这方面,我们关注对二进制结果变量(描述苯型)和从MALDI测量中得出的多个预测器之间的关联的评价。我们建议采用一种推论程序,即使用测试统计数据的关联矩阵。这种方法以多种边际模型为基础(MMMMDD$)。具体地说,我们为每个预测者分别设置了一个边际物流回归模型。我们从边际边际数据基体的常态常态结合环境环境环境,从边际数据矢量测测测测测测测测测测得出法的F基模型,最终根据标准确定一个比的基基基基基值的基值。