Variable screening has been a useful research area that helps to deal with ultra-high-dimensional data. When there exist both marginally and jointly dependent predictors to the response, existing methods such as conditional screening or iterative screening often suffer from the instability against the selection of the conditional set or the computational burden, respectively. In this paper, we propose a new independence measure, named conditional martingale difference divergence (CMDH), that can be treated as either a conditional or a marginal independence measure. Under regularity conditions, we show that the sure screening property of CMDH holds for both marginally and jointly active variables. Based on this measure, we propose a kernel-based model-free variable screening method that is efficient, flexible, and stable against high correlation and heterogeneity. In addition, we provide a data-driven method of conditional set selection, when the conditional set is unknown. In simulations and real data applications, we demonstrate the superior performance of the proposed method.
翻译:变量筛选是一个有用的研究领域,有助于处理超高维数据。如果对反应有少量和共同依赖的预测,有条件筛选或迭代筛选等现有方法往往因选择有条件数据集或计算负担而处于不稳定状态。在本文中,我们提出了一个新的独立措施,称为有条件的马丁加尔差异差异(CMDH),可被视为有条件或边际独立措施。在正常情况下,我们表明,CMDH的肯定筛选属性对边际变量和联合活跃变量都有作用。根据这一措施,我们提出了一种高效、灵活和稳定的基于无型内核的变量筛选方法,以对抗高度关联性和异质性。此外,我们提供了一种数据驱动的有条件设定选择方法,在有条件数据集未知时,我们提供一种数据驱动的方法。在模拟和真实数据应用中,我们展示了拟议方法的优异性。