We tackle the problem of nonparametric variable selection with a focus on discovering interactions between variables. With $p$ variables there are $O(p^s)$ possible order-$s$ interactions making exhaustive search infeasible. It is nonetheless possible to identify the variables involved in interactions with only linear computation cost, $O(p)$. The trick is to maximize a class of parametrized nonparametric dependence measures which we call metric learning objectives; the landscape of these nonconvex objective functions is sensitive to interactions but the objectives themselves do not explicitly model interactions. Three properties make metric learning objectives highly attractive: (a) The stationary points of the objective are automatically sparse (i.e. performs selection) -- no explicit $\ell_1$ penalization is needed. (b) All stationary points of the objective exclude noise variables with high probability. (c) Guaranteed recovery of all signal variables without needing to reach the objective's global maxima or special stationary points. The second and third properties mean that all our theoretical results apply in the practical case where one uses gradient ascent to maximize the metric learning objective. While not all metric learning objectives enjoy good statistical power, we design an objective based on $\ell_1$ kernels that does exhibit favorable power: it recovers (i) main effects with $n \sim \log p$ samples, (ii) hierarchical interactions with $n \sim \log p$ samples and (iii) order-$s$ pure interactions with $n \sim p^{2(s-1)}\log p$ samples.
翻译:我们解决了非参数变量选择的问题,重点是发现变量之间的相互作用。 以美元为主的变量是 $O( p%s), 可能的定序- $s 互动使详尽搜索不可行。 尽管如此, 仍有可能确定互动中涉及的变量, 只有线性计算成本, $O( p) 美元。 我们称之为衡量学习目标, 这些非碳化非目标功能的景观对互动十分敏感, 但目标本身并不明确模型互动。 三种特性使得衡量学习目标具有很高的吸引力:(a) 目标的固定点是自动稀释的( 进行选择) -- 不需要明确的 $\ ell_ 1美元处罚 。 (b) 目标的所有固定点都排除高概率的噪音变量。 (c) 保证恢复所有信号变量,而不需要达到目标的全球最大值或特殊固定点。 第二和第三特性意味着我们所有的理论结果都适用于实际案例, 其中我们用梯度作为衡量指标学习目标的最大值( $ ) 。 虽然所有指标性指标性目标都享有良好的统计顺序, 以 以 $ 为主的图像 。