A common statistical problem is inference from positive-valued multivariate measurements where the scale (e.g., sum) of the measurements are not representative of the scale (e.g., total size) of the system being studied. This situation is common in the analysis of modern sequencing data. The field of Compositional Data Analysis (CoDA) axiomatically states that analyses must be invariant to scale. Yet, many scientific questions rely on the unmeasured system scale for identifiability. Instead, many existing tools make a wide variety of assumptions to identify models, often imputing the unmeasured scale. Here, we analyze the theoretical limits on inference given these data and formalize the assumptions required to provide principled scale reliant inference. Using statistical concepts such as consistency and calibration, we show that we can provide guidance on how to make scale reliant inference from these data. We prove that the Frequentist ideal is often unachievable and that existing methods can demonstrate bias and a breakdown of Type-I error control. We introduce scale simulation estimators and scale sensitivity analysis as a rigorous, flexible, and computationally efficient means of performing scale reliant inference.
翻译:一个共同的统计问题是,从正值多变量测量中推断出一个共同的统计问题,即测量的尺度(例如,总和)不能代表所研究的系统的规模(例如,总大小),这种情况在现代测序数据的分析中很常见。组成数据分析(CoDA)的必然领域指出,分析必须无差别地按比例进行。然而,许多科学问题依赖于非计量的系统规模,以辨别可辨度。相反,许多现有工具为确定模型而作出各种各样的假设,往往对未测的尺度进行估算。在这里,我们分析这些数据的推算的理论限度,正式确定提供有原则的根据的推算尺度所需的假设。我们使用一致性和校准等统计概念,表明我们可以指导如何使尺度的尺度取决于这些数据的推算。我们证明,常识理想往往无法实现,现有方法可以显示偏差和类型一误控的分解。我们采用规模模拟估测和尺度敏感度分析作为严格、灵活、可靠和计算性推算性推算的尺度。