论文标题

通过最大程度地减少可重建性,强大的仿制

Powerful Knockoffs via Minimizing Reconstructability

论文作者

Spector, Asher, Janson, Lucas

论文摘要

Model-X仿制允许分析师使用几乎所有机器学习算法执行特征选择,同时仍然可以控制错误发现的预期比例。要应用Model-X仿冒品,必须构建称为仿冒变量的合成变量,该变量在特征选择过程中有效地充当控制。用于构建仿冒品的黄金标准是最大程度地减少功能及其仿冒品之间的平均绝对相关性(MAC),但是,令人惊讶的是,我们证明,在极其简单的设置中,包括具有可交换功能的高斯线性模型,这一过程可能无能为力。关键问题是,将MAC最小化在功能和仿冒品之间产生强大的关节依赖性,从而使机器学习算法可以通过仿冒品部分或完全重建功能对响应的影响。为了提高仿冒品的力量,我们提出产生仿冒品,以最大程度地减少特征的可重建性(MRC),并通过证明它是计算上有效,健壮且功能强大的。我们还证明,在高斯线性模型中,某些MRC仿制最小化估计误差的自然定义。此外,在一系列广泛的模拟中,我们发现许多具有相关特征的设置,其中MRC仿冒品极大地超过了MAC最少的仿冒品,而没有MAC最小化的仿型超过MRC仿型的设置,而不是非常小的额度。我们在新的开源Python软件包Knockpy中实施了我们的方法和其他许多人的方法。

Model-X knockoffs allows analysts to perform feature selection using almost any machine learning algorithm while still provably controlling the expected proportion of false discoveries. To apply model-X knockoffs, one must construct synthetic variables, called knockoffs, which effectively act as controls during feature selection. The gold standard for constructing knockoffs has been to minimize the mean absolute correlation (MAC) between features and their knockoffs, but, surprisingly, we prove this procedure can be powerless in extremely easy settings, including Gaussian linear models with correlated exchangeable features. The key problem is that minimizing the MAC creates strong joint dependencies between the features and knockoffs, which allow machine learning algorithms to partially or fully reconstruct the effect of the features on the response using the knockoffs. To improve the power of knockoffs, we propose generating knockoffs which minimize the reconstructability (MRC) of the features, and we demonstrate our proposal for Gaussian features by showing it is computationally efficient, robust, and powerful. We also prove that certain MRC knockoffs minimize a natural definition of estimation error in Gaussian linear models. Furthermore, in an extensive set of simulations, we find many settings with correlated features in which MRC knockoffs dramatically outperform MAC-minimizing knockoffs and no settings in which MAC-minimizing knockoffs outperform MRC knockoffs by more than a very slight margin. We implement our methods and a host of others from the knockoffs literature in a new open source python package knockpy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源