Parameters of sub-populations can be more relevant than super-population ones. For example, a healthcare provider may be interested in the effect of a treatment plan for a specific subset of their patients; policymakers may be concerned with the impact of a policy in a particular state within a given population. In these cases, the focus is on a specific finite population, as opposed to an infinite super-population. Such a population can be characterized by fixing some attributes that are intrinsic to them, leaving unexplained variations like measurement error as random. Inference for a population with fixed attributes can then be modeled as inferring parameters of a conditional distribution. Accordingly, it is desirable that confidence intervals are conditionally valid for the realized population, instead of marginalizing over many possible draws of populations. We provide a statistical inference framework for parameters of finite populations with known attributes. Leveraging the attribute information, our estimators and confidence intervals closely target a specific finite population. When the data is from the population of interest, our confidence intervals attain asymptotic conditional validity given the attributes, and are shorter than those for super-population inference. In addition, we develop procedures to infer parameters of new populations with differing covariate distributions; the confidence intervals are also conditionally valid for the new populations under mild conditions. Our methods extend to situations where the fixed information has a weaker structure or is only partially observed. We demonstrate the validity and applicability of our methods using simulated and real-world data.
翻译:子总体的参数可能比超总体更相关。例如,医疗保健提供商可能关心他们某个特定患者子集的治疗计划效应; 政策制定者可能关注给定人口中某个特定州的政策影响。在这些情况下,重点是一个特定的有限总体,而不是一个无限的超总体。这样一个总体可以通过固定某些内在属性来描述,将未解释的变化如测量误差视为随机。针对具有固定属性的总体的推断可以被建模为推断条件分布的参数。因此,令置信区间针对实现的总体接近特定有限总体是可取的,而不是将其归结为许多可能抽样的总体。 我们提供了一个已知属性的有限总体参数的统计推断框架。利用属性信息,我们的估计和置信区间密切地针对特定的有限总体。当数据来自于感兴趣的总体时,我们的置信区间在给定属性情况下具有渐近条件效度,并且长度比超总体推断的更短。此外,我们开发了推断具有不同协变量分布的新总体参数的程序; 在温和条件下,置信区间也对新总体有条件有效性。我们的方法扩展到固定信息具有较弱结构或仅部分观察到的情况。我们使用模拟和真实数据展示了我们方法的有效性和适用性。