论文标题
评估贝叶斯回归树的可变活动
Assessing variable activity for Bayesian regression trees
论文作者
论文摘要
贝叶斯添加剂回归树(BART)是非参数模型,可以捕获复杂的外源可变效应。在任何回归问题中,学习哪些变量最活跃,通常都是很感兴趣的。通常,通过计算每个变量的树拆分的次数来测量BART中的可变活动。这样的单向计数具有快速计算的优势。尽管有方便,但单向计数仍有几个问题。它们在统计上是不合理的,无法区分主要效应和相互作用效应,并且在测量相互作用效应时会变得膨胀。文献中良好建立的另一种方法是SOBOL的指数,这是一种基于方差的全球灵敏度分析技术。但是,这些指数通常需要蒙特卡洛集成,这在计算上可能很昂贵。本文提供了BART后样品的Sobol指数的分析表达式。这些表达式易于解释,并且在计算上是可行的。此外,我们将在一阶(主要效应)SOBOL索引和单向计数之间显示出令人着迷的联系。我们还介绍了一种新颖的排名方法,并使用它来证明所提出的指数保留了基于索博尔的级别的可变重要性顺序。最后,我们使用分析测试功能和路线的气候影响模拟器比较这些方法。
Bayesian Additive Regression Trees (BART) are non-parametric models that can capture complex exogenous variable effects. In any regression problem, it is often of interest to learn which variables are most active. Variable activity in BART is usually measured by counting the number of times a tree splits for each variable. Such one-way counts have the advantage of fast computations. Despite their convenience, one-way counts have several issues. They are statistically unjustified, cannot distinguish between main effects and interaction effects, and become inflated when measuring interaction effects. An alternative method well-established in the literature is Sobol' indices, a variance-based global sensitivity analysis technique. However, these indices often require Monte Carlo integration, which can be computationally expensive. This paper provides analytic expressions for Sobol' indices for BART posterior samples. These expressions are easy to interpret and are computationally feasible. Furthermore, we will show a fascinating connection between first-order (main-effects) Sobol' indices and one-way counts. We also introduce a novel ranking method, and use this to demonstrate that the proposed indices preserve the Sobol'-based rank order of variable importance. Finally, we compare these methods using analytic test functions and the En-ROADS climate impacts simulator.