有条件推断:迈向统计证据的等级制度 (Conditional Inference: Towards a Hierarchy of Statistical Evidence)

Statistical uncertainty has many sources. P-values and confidence intervals usually quantify the overall uncertainty, which may include variation due to sampling and uncertainty due to measurement error, among others. Practitioners might be interested in quantifying only one source of uncertainty. For example, one might be interested in the uncertainty of a regression coefficient of a fixed set of subjects, which corresponds to quantifying the uncertainty due to measurement error and ignoring the variation induced by sampling. In causal inference it is common to infer treatment effects for a certain set of subjects, only accounting for uncertainty due to random treatment assignment. Motivated by these examples, we consider conditional estimation and conditional inference for parameters in parametric and semi-parametric models, where we condition on observed characteristics of a population. We derive a theory of conditional inference, including methods to obtain conditionally valid p-values and confidence intervals. Conditional p- values can be used to construct a hierarchy of statistical evidence that may help clarify the generalizability of a statistical finding. We show that a naive method allows to gauge the generalizability of a finding, with rigorous control of the family-wise error rate. In addition, the proposed approach allows to conduct transfer learning of conditional parameters, with rigorous conditional guarantees. The performance of the proposed approach is evaluated on simulated and real-world data.

翻译：统计不确定性有许多来源。P值和信任间隔通常对总体不确定性进行量化,其中可能包括因抽样和因测量错误造成的不确定性而产生的差异。从业者可能只有兴趣量化一个不确定来源。例如,可能有兴趣量化一组固定主题的回归系数的不确定性,这相当于量化测量错误造成的不确定性,忽视抽样引起的差异。在因果推断中,通常可以推断某组主题的处理效果,只考虑随机处理任务造成的不确定性。根据这些例子,我们考虑对参数和半参数模型中的参数进行有条件估计和有条件推断,我们根据观察到的人口特征进行测试。我们可能感兴趣的是有条件的推论,包括获得有条件有效的p值和信任间隔期的方法。使用附带条件的p值来构建统计证据等级,有助于澄清统计结果的可概括性。根据这些例子,我们考虑对参数和半参数进行有条件的估算和有条件的推断。我们根据观察到的参数,以观察到的人口特征为条件的和半参数。我们提出了一种有条件的推论理论,包括获得有条件的 p值和信任期。使用有条件的P-值,可以用来构建有助于澄清统计结果的一般性。我们显示一种天性方法。我们能够测量对家庭错率进行一般的判断,同时进行精确地测测测测测测测测测测数据。