Many modern applications seek to understand the relationship between an outcome variable $Y$ and a covariate $X$ in the presence of a (possibly high-dimensional) confounding variable $Z$. Although much attention has been paid to testing whether $Y$ depends on $X$ given $Z$, in this paper we seek to go beyond testing by inferring the strength of that dependence. We first define our estimand, the minimum mean squared error (mMSE) gap, which quantifies the conditional relationship between $Y$ and $X$ in a way that is deterministic, model-free, interpretable, and sensitive to nonlinearities and interactions. We then propose a new inferential approach called floodgate that can leverage any working regression function chosen by the user (allowing, e.g., it to be fitted by a state-of-the-art machine learning algorithm or be derived from qualitative domain knowledge) to construct asymptotic confidence bounds, and we apply it to the mMSE gap. In addition to proving floodgate's asymptotic validity, we rigorously quantify its accuracy (distance from confidence bound to estimand) and robustness. We demonstrate floodgate's performance in a series of simulations and apply it to data from the UK Biobank to infer the strengths of dependence of platelet count on various groups of genetic mutations.
翻译:许多现代应用都试图理解结果变量Y美元和在(可能是高维的)可变Z美元的情况下共变美元之间的关系。虽然我们非常注意测试美元是否依赖给Z美元,但在本文件中,我们试图通过推断依赖力来超越测试范围。我们首先定义我们的估计值,即最小平均正方差(mMSE)差距,它以确定性、无模型、可解释和敏感于非线性和互动的方式量化Y美元和美元之间的有条件关系。我们随后提议一种称为“洪门”的新的推论方法,它能够利用用户选择的任何工作回归功能(例如,它要由最先进的机器学习算法或从定性域知识中衍生出来),来构建无损信任的界限,我们将其应用于MMSE差距。除了证明洪门具有确定性、无模型、可解释和对非线性和互动的敏感度外,我们还提议一种称为“洪门”的新的推论方法,可以利用用户选择的任何工作回归功能(例如,它要用最先进的机器学习算法或从定性的域知识中推算出)来构建无谓的信任度,我们将它用于测量的比值的深度坚固的磁性坚固的磁度,从英国的模型的精确度的精确度,我们将它的精确度转化为测量性地将它的精确性地量化地从英国的精确性地用数据加以量化地量化地测量。