Many partial identification problems can be characterized by the optimal value of a function over a set where both the function and set need to be estimated by empirical data. Despite some progress for convex problems, statistical inference in this general setting remains to be developed. To address this, we derive an asymptotically valid confidence interval for the optimal value through an appropriate relaxation of the estimated set. We then apply this general result to the problem of selection bias in population-based cohort studies. We show that existing sensitivity analyses, which are often conservative and difficult to implement, can be formulated in our framework and made significantly more informative via auxiliary information on the population. We conduct a simulation study to evaluate the finite sample performance of our inference procedure and conclude with a substantive motivating example on the causal effect of education on income in the highly-selected UK Biobank cohort. We demonstrate that our method can produce informative bounds using plausible population-level auxiliary constraints. We implement this method in the R package selectioninterval.
翻译:许多部分识别问题的特征特征是,功能和设定功能都需要用经验数据来估计的一组功能具有最佳价值。尽管在共性问题上取得了一些进展,但这一总体背景下的统计推论仍有待发展。为了解决这个问题,我们通过适当放松估计值,为最佳价值得出一个无实际效力的信任区间。然后,我们将这一总体结果应用于基于人口的组群研究中选择偏见的问题。我们表明,现有的敏感性分析(这些分析往往比较保守,难以执行)可以在我们的框架中进行,并通过关于人口的辅助信息提供大量信息。我们进行模拟研究,评估我们推断程序的有限抽样绩效,并以实质性的激励性实例结束,说明教育对高选取的英国生物银行群群的收入产生的因果关系。我们证明,我们的方法能够利用合理的人口级辅助限制来产生信息界限。我们在R组合选择隔年中采用这种方法。