Many modern applications seek to understand the relationship between an outcome variable $Y$ and a covariate $X$ in the presence of a (possibly high-dimensional) confounding variable $Z$. Although much attention has been paid to testing \emph{whether} $Y$ depends on $X$ given $Z$, in this paper we seek to go beyond testing by inferring the \emph{strength} of that dependence. We first define our estimand, the minimum mean squared error (mMSE) gap, which quantifies the conditional relationship between $Y$ and $X$ in a way that is deterministic, model-free, interpretable, and sensitive to nonlinearities and interactions. We then propose a new inferential approach called \emph{floodgate} that can leverage any working regression function chosen by the user (allowing, e.g., it to be fitted by a state-of-the-art machine learning algorithm or be derived from qualitative domain knowledge) to construct asymptotic confidence bounds, and we apply it to the mMSE gap. \acc{We additionally show that floodgate's accuracy (distance from confidence bound to estimand) is adaptive to the error of the working regression function.} We then show we can apply the same floodgate principle to a different measure of variable importance when $Y$ is binary. Finally, we demonstrate floodgate's performance in a series of simulations and apply it to data from the UK Biobank to infer the strengths of dependence of platelet count on various groups of genetic mutations.
翻译:许多现代应用都试图理解结果变量Y$和在(可能高度)折合的变量Z$(Z美元)中共变的X美元之间的关系。尽管我们非常关注测试\ emph{Y$是否取决于美元是否取决于给Z$X美元,但在本文中,我们试图通过推断用户选择的任何工作回归功能(例如,可调试、可调试、可调试)来超越测试范围。我们首先定义了我们的估量和最小平均平方差(mMSE),该差以确定、无模型、可解释和敏感于非线性和互动的方式量化了美元和美元之间的有条件关系。我们随后提出了一个新的推论方法,即称为\ emph{floodgate},我们可以利用用户选择的任何工作回归功能(比如说,要用一种状态的机器学习算法或从定性域知识中推导出),来构建一个自确定性基调的基数的基数的基数。当我们用量的基数原则来显示UMSE值的基数的精确性时,我们最终将它应用到ME的基调的精确性功能。