寻找互动:为什么Laplace内核是你的朋友 (Searching for Interactions: Why the Laplace Kernel is your Friend)

We tackle the problem of nonparametric variable selection with a focus on discovering interactions between variables. With $p$ variables there are $O(p^s)$ possible order-$s$ interactions making exhaustive search infeasible. It is nonetheless possible to identify the variables involved in interactions with only linear computation cost, $O(p)$. The trick is to maximize a class of parametrized nonparametric dependence measures which we call \emph{metric learning objectives}; the landscape of these nonconvex objective functions is sensitive to interactions but the objectives themselves do not explicitly model interactions. Three properties make metric learning objectives highly attractive: (a) The stationary points of the objective are automatically sparse (i.e. performs selection)---no explicit $\ell_1$ penalization is needed. (b) All stationary points of the objective exclude noise variables with high probability. (c) Guaranteed recovery of all signal variables without needing to reach the objective's global maxima or special stationary points. The second and third properties mean that all our theoretical results apply in the practical case where one uses gradient ascent to maximize the metric learning objective. While not all metric learning objectives enjoy good statistical power, we design an objective based on $\ell_1$ kernels that does exhibit favorable power: it recovers (i) main effects with $n \sim \log p$ samples, (ii) hierarchical interactions with $n \sim \log p$ samples and (iii) order-$s$ pure interactions with $n \sim p^{2(s-1)}\log p$ samples.

翻译：我们解决了非参数变量选择的问题, 重点是发现变量之间的相互作用。有 $p 变量, 有 $O (p%s) 可能的顺序- $ s 互动, 使得无法进行详尽的搜索。尽管如此, 仍然有可能确定互动中涉及的变量, 只有线性计算成本, $O (p) 美元。关键在于最大限度地增加一个我们称之为 \ emph{ 度学习目标的参数类别; 这些非convex 目标功能的景观对互动十分敏感, 但目标本身并不明确模型互动。三个属性使得指标学习目标的具体目标具有高度的吸引力:(a) 目标的固定点是自动稀少的( 即进行选择) - 没有明确的 $\ ell\ 1 美元的处罚。 (b) 目标的所有固定点都排除高概率的噪音变量。 (c) 保证所有信号变量的恢复不需要达到目标的美元最高值或特殊固定点。第二和第三个属性意味着我们所有的理论结果都适用于一个实际案例, 其中, 一个使用梯度的将精度和精度基度度基度度度和基度度目标学习学习的的目标, 学习一个基基基基基基基基基基基基基基基的基基基基基基基基基的的的基基基基基基基基基基基基基基基基基度基基基基基基基基基基基的基基基的的的基基的基基基基基度基的基的的的度基基基基基基基度基基基基基度度度度基基基基基基基基基基基基基基基基基基基