Many modern applications seek to understand the relationship between an outcome variable $Y$ and a covariate $X$ in the presence of a (possibly high-dimensional) confounding variable $Z$. Although much attention has been paid to testing whether $Y$ depends on $X$ given $Z$, in this paper we seek to go beyond testing by inferring the strength of that dependence. We first define our estimand, the minimum mean squared error (mMSE) gap, which quantifies the conditional relationship between $Y$ and $X$ in a way that is deterministic, model-free, interpretable, and sensitive to nonlinearities and interactions. We then propose a new inferential approach called floodgate that can leverage any working regression function chosen by the user (allowing, e.g., it to be fitted by a state-of-the-art machine learning algorithm or be derived from qualitative domain knowledge) to construct asymptotic confidence bounds, and we apply it to the mMSE gap. In addition to proving floodgate's asymptotic validity, we rigorously quantify its accuracy (distance from confidence bound to estimand) and robustness. We then show we can apply the same floodgate principle to a different measure of variable importance when $Y$ is binary. Finally, we demonstrate floodgate's performance in a series of simulations and apply it to data from the UK Biobank to infer the strengths of dependence of platelet count on various groups of genetic mutations.
翻译:许多现代应用都试图理解结果变量Y美元和共同变换美元之间在(可能是高维的)可变Z美元之间的关系。虽然我们非常关注测试美元是否取决于给Z美元,但在本文件中,我们试图通过推断依赖性的力量,超越测试范围。我们首先定义我们的估计值,即最小平均正方差(mMSE),它以确定性、无模型、可解释和敏感于非线性和互动的方式,量化Y美元和X美元之间的有条件关系。我们随后提出了一种称为“洪门”的新的推论方法,它能够利用用户选择的任何工作回归功能(例如,可以使用最先进的机器学习算法,或从定性域知识中推导出),用来构建不那么简单的信任度,我们将其应用于MMSE差距。除了证明洪门具有确定性、无模型、可解释性、对非线性和互动十分敏感之外,我们还提议一种称为“洪门”的新的推论方法,用以利用用户选择的任何工作回归功能(例如,用最先进的机器学习算法算法,或从质量知识中推算出)来构建一个不稳妥的信任度的模型。我们最后将其精确度的精确度的精确度测量度测量度测量。我们可以将各种数据的精确度运用到测量度。