论文标题
可扩展互动检测的自我素化目标函数
A Self-Penalizing Objective Function for Scalable Interaction Detection
论文作者
论文摘要
我们解决了非参数变量选择的问题,重点是发现变量之间的相互作用。使用$ P $变量,有$ O(p^s)$可能的订单-S $交互作用,使详尽的搜索不可行。尽管如此,可以识别仅与线性计算成本相互作用涉及的变量,$ o(p)$。诀窍是最大化一类参数化的非参数依赖度量,我们称之为公制学习目标;这些非凸目标函数的景观对相互作用很敏感,但目标本身并未明确模拟相互作用。三个属性使公制学习目标高度吸引力: (a)目标的固定点会自动稀疏(即执行选择) - 不需要明确的$ \ ell_1 $惩罚。 (b)客观的所有固定点排除了高概率的噪声变量。 (c)保证恢复所有信号变量,而无需达到目标的全局最大值或特殊固定点。 第二和第三属性意味着我们所有的理论结果适用于实际情况,即人们使用梯度上升来最大化度量学习目标。虽然并非所有指标学习目标都具有良好的统计能力,但我们设计了一个基于$ \ ell_1 $内核的目标,确实具有有利的力量:它恢复(i)$ n \ sim \ log p $样本的主要效果,(ii)与$ n \ log \ sim \ sim \ sim \ sim p $ samples and(iii)$ s $ s $ s $ s $ s $ s $ s $ p^2(s-p^2)样品。
We tackle the problem of nonparametric variable selection with a focus on discovering interactions between variables. With $p$ variables there are $O(p^s)$ possible order-$s$ interactions making exhaustive search infeasible. It is nonetheless possible to identify the variables involved in interactions with only linear computation cost, $O(p)$. The trick is to maximize a class of parametrized nonparametric dependence measures which we call metric learning objectives; the landscape of these nonconvex objective functions is sensitive to interactions but the objectives themselves do not explicitly model interactions. Three properties make metric learning objectives highly attractive: (a) The stationary points of the objective are automatically sparse (i.e. performs selection) -- no explicit $\ell_1$ penalization is needed. (b) All stationary points of the objective exclude noise variables with high probability. (c) Guaranteed recovery of all signal variables without needing to reach the objective's global maxima or special stationary points. The second and third properties mean that all our theoretical results apply in the practical case where one uses gradient ascent to maximize the metric learning objective. While not all metric learning objectives enjoy good statistical power, we design an objective based on $\ell_1$ kernels that does exhibit favorable power: it recovers (i) main effects with $n \sim \log p$ samples, (ii) hierarchical interactions with $n \sim \log p$ samples and (iii) order-$s$ pure interactions with $n \sim p^{2(s-1)}\log p$ samples.