Statistical uncertainty has many components, such as measurement errors, temporal variation, or sampling. Not all of these sources are relevant when considering a specific application, since practitioners might view some attributes of observations as fixed. We study the statistical inference problem arising when data is drawn conditionally on some attributes. These attributes are assumed to be sampled from a super-population but viewed as fixed when conducting uncertainty quantification. The estimand is thus defined as the parameter of a conditional distribution. We propose methods to construct conditionally valid p-values and confidence intervals for these conditional estimands based on asymptotically linear estimators. In this setting, a given estimator is conditionally unbiased for potentially many conditional estimands, which can be seen as parameters of different populations. Testing different populations raises questions of multiple testing. We discuss simple procedures that control novel conditional error rates. In addition, we introduce a bias correction technique that enables transfer of estimators across conditional distributions arising from the same super-population. This can be used to infer parameters and estimators on future datasets based on some new data. The validity and applicability of the proposed methods are demonstrated on simulated and real-world data.
翻译:统计不确定性有许多组成部分,例如测量错误、时间差异或抽样。并非所有这些来源在考虑特定应用时都具有相关性,因为实践者可能认为观测的某些属性是固定的。我们研究在数据以某些属性为条件时产生的统计推断问题。这些属性假定是从超人口抽样的,但在进行不确定性量化时被视为固定的。估计值因此被定义为有条件分布的参数。我们建议采用一些方法,在考虑某一具体应用时,为这些有条件的估计值和信任度间隙设定有条件有效的p值。在这个设置中,给定的估算值对潜在的许多有条件估计值是有条件的,可以被视为不同人群的参数。测试不同人群会引起多重测试问题。我们讨论控制新设定的有条件误差率的简单程序。此外,我们引入了一种偏差纠正技术,允许根据同一超人口生成的有条件分布,在有条件分布之间转让估计值和信任度间隔。这可用于根据一些新的数据对未来数据集的参数和估计值进行推断。根据一些新的数据模拟,模拟了拟议方法的有效性和适用性。