论文标题

超级界限的特征重要性:从因果保证的数据中学习

Ultra-marginal Feature Importance: Learning from Data with Causal Guarantees

论文作者

Janssen, Joseph, Guan, Vincent, Robeva, Elina

论文摘要

科学家经常优先考虑从数据学习,而不是培训最佳模型;但是,机器学习的研究通常优先考虑后者。通过提供一个有用的框架来量化数据关系,从而开发了边际贡献特征重要性(MCI)来打破这一趋势。在这项工作中,我们旨在提高MCI的理论属性,性能和运行时,通过引入超边界特征的重要性(UMFI),该特征使用AI公平文献中的依赖删除技术作为其基础。我们首先提出有关特征重要性方法的公理,以解释数据中的因果关系和关联关系,我们证明UMFI在基本假设下满足了这些公理。然后,我们在真实和模拟数据上显示,UMFI的性能要比MCI表现更好,尤其是在存在相关的相互作用和无关特征的情况下,同时部分学习了因果图的结构,并减少了MCI的指数运行时间到超级线性。

Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. Marginal contribution feature importance (MCI) was developed to break this trend by providing a useful framework for quantifying the relationships in data. In this work, we aim to improve upon the theoretical properties, performance, and runtime of MCI by introducing ultra-marginal feature importance (UMFI), which uses dependence removal techniques from the AI fairness literature as its foundation. We first propose axioms for feature importance methods that seek to explain the causal and associative relationships in data, and we prove that UMFI satisfies these axioms under basic assumptions. We then show on real and simulated data that UMFI performs better than MCI, especially in the presence of correlated interactions and unrelated features, while partially learning the structure of the causal graph and reducing the exponential runtime of MCI to super-linear.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源