It has been argued that parameters that characterize sub-populations can be more relevant than super-population parameters. For example, a video subscription service might be interested in estimating the satisfaction of its current customers, as opposed to estimating that of a hypothetical infinite super-population. In this case, the customers might be viewed as fixed, while the satisfaction measurements might be random due to measurement noise and temporal variation. More generally, inference for populations with fixed attributes can be modeled as inferring parameters of conditional distributions given these attributes. Since the data for such sub-population are drawn from a conditional distribution, it is desirable that confidence intervals have conditional coverage guarantees, as opposed to marginal coverage guarantees. We provide a framework for statistical inference on parameters of sub-populations with fixed attributes. We construct confidence intervals that attain asymptotic validity given the attributes. In addition, we develop a set of tools to infer the parameters of new populations with observed attributes under covariate shift; the confidence intervals also attain asymptotic conditional validity under mild conditions. The validity and applicability of the proposed methods are demonstrated on simulated and real-world data.
翻译:据认为,亚人口特征的参数比超人口参数更具有相关性,例如,视频订阅服务可能有兴趣估计其当前客户的满意度,而不是估计假设的无限超人口;在这种情况下,客户可被视为固定的,而满意度测量可能因测量噪音和时间差异而随机进行;更一般地说,具有固定属性的人口的推论可模拟为根据这些属性进行有条件分布的推论参数;由于这类亚人口的数据来自有条件分布,因此,信任期最好有有条件的覆盖保障,而不是边际覆盖保障;我们为具有固定属性的亚人口参数的统计推断提供了一个框架;我们根据这些属性构建了信任期,从而达到无损有效性;此外,我们开发了一套工具,用以推算具有观察到的同化变的属性的新人口参数;在温和条件下,信任期也达到了无症状的有条件的有效性;拟议方法的有效性和适用性在模拟数据和实际数据上得到了证明。