It is of great importance to investigate the significance of a subset of covariates W for the response Y given covariates Z in regression modeling. To this end, we propose a new significance test for the partial mean independence problem based on deep neural networks and data splitting. The test statistic converges to the standard chi-squared distribution under the null hypothesis while it converges to a normal distribution under the alternative hypothesis. We also suggest a powerful ensemble algorithm based on multiple data splitting to enhance the testing power. If the null hypothesis is rejected, we propose a new partial Generalized Measure of Correlation (pGMC) to measure the partial mean dependence of Y given W after controlling for the nonlinear effect of Z, which is an interesting extension of the GMC proposed by Zheng et al. (2012). We present the appealing theoretical properties of the pGMC and establish the asymptotic normality of its estimator with the optimal root-N converge rate. Furthermore, the valid confidence interval for the pGMC is also derived. As an important special case when there is no conditional covariates Z, we also consider a new test of overall significance of covariates for the response in a model-free setting. We also introduce new estimator of GMC and derive its asymptotic normality. Numerical studies and real data analysis are also conducted to compare with existing approaches and to illustrate the validity and flexibility of our proposed procedures.
翻译:在回归模型中,必须调查对 Y 响应 Y 给 Z 的 共变值 W 子子集的意义。 为此,我们提出基于深神经网络和数据分割的局部中值独立问题进行新的意义测试。 测试统计数据在无效假设下与标准气相匹配分布相融合, 而它在替代假设下则与正常分布相融合。 我们还建议基于多个数据分离的强大混合算法, 增强测试力。 如果否定的假设被否决, 我们提议一个新的局部通缩通用度(pGMC)测量Y在控制Z的非线性效应后的部分中值偏差依赖性(pGMC) 。 这是Zheng等人(2012) 提议的GMC 的有趣的扩展。 我们展示了PGMC 的诱人的理论属性, 并确定了其估计值与最佳root-N 趋同率的无偏差性正常性。 此外, 也为PGMC 提出了一个重要的特别案例, 在没有对正变值的正变数和新变数分析中, 我们还考虑对新数据进行共同测试。