Variable screening has been a useful research area that deals with ultrahigh-dimensional data. When there exist both marginally and jointly dependent predictors to the response, existing methods such as conditional screening or iterative screening often suffer from instability against the selection of the conditional set or the computational burden, respectively. In this article, we propose a new independence measure, named conditional martingale difference divergence (CMDH), that can be treated as either a conditional or a marginal independence measure. Under regularity conditions, we show that the sure screening property of CMDH holds for both marginally and jointly active variables. Based on this measure, we propose a kernel-based model-free variable screening method, which is efficient, flexible, and stable against high correlation among predictors and heterogeneity of the response. In addition, we provide a data-driven method to select the conditional set. In simulations and real data applications, we demonstrate the superior performance of the proposed method.
翻译:变量筛选是一个有用的研究领域,涉及超高维数据。当存在对响应的少量和共同依赖的预测器时,有条件筛选或迭代筛选等现有方法往往在选择有条件的成套数据或计算负担时会受到不稳定的影响。在本条中,我们提议了一个新的独立措施,称为有条件的马丁加尔差异(CMDH),可被视为有条件的或边际的独立措施。在正常情况下,我们表明,中国人权协会的可靠筛选属性对轻微的和联合活跃的变量都有。根据这一措施,我们提出了一种无内核模式的无型变量筛选法,该方法高效、灵活、稳定,与预测器和反应的异质性之间高度相关。此外,我们提供了一种数据驱动方法来选择有条件的成套数据。在模拟和真实数据应用中,我们展示了拟议方法的优异性。