Many scientific fields, including human gut microbiome science, collect multivariate count data where the sum of the counts is unrelated to the scale of the underlying system being measured (e.g., total microbial load in a subject's colon). This disconnect complicates downstream analyses such as differential analysis in case-control studies. This article is motivated by a novel study of in vitro human gut microbiome models. Popular tools for analyzing these data led to dramatically elevated rates of both false positives and false negatives. To understand those failures, we provide a formal problem statement that frames these challenges of scale in terms of the classical theory of identifiability. We call this the problem of Scale Reliant Inference (SRI). We use this formulation to prove fundamental limits on SRI in terms of criteria such as consistency and type-I error control. We show that the failures of existing methods stem from a fundamental failure to properly quantify uncertainty in the system scale. We demonstrate that a particular type of Bayesian model called a Bayesian Partially Identified Model (PIMs) can correctly quantify uncertainty in SRI. We introduce Scale Simulation Random Variables (SSRVs) as a flexible and efficient approach to specifying and inferring Bayesian PIMs. In the context of both real and simulated data, we find SSRVs drastically decrease type-I and type-II error rates.
翻译:暂无翻译