论文标题
BSNSING:基于递归最佳布尔规则组成的决策树感应方法
bsnsing: A decision tree induction method based on recursive optimal boolean rule composition
论文作者
论文摘要
本文提出了一种新的混合成员编程(MIP)公式,以优化决策树诱导过程中的拆分规则选择,并开发有效的搜索算法,该算法能够比商用溶解器更快地求解MIP模型的实际实例。该配方是新颖的,因为它直接最大化了Gini还原,这是一个有效的分裂选择标准,从未在数学程序中建模以实现其非概念性。所提出的方法与其他最佳分类树模型不同,因为它不会尝试优化整个树,因此保留了递归分区方案的灵活性,并且优化模型更加可观。该方法是在名为BSNSing的开源R软件包中实现的。 Benchmarking experiments on 75 open data sets suggest that bsnsing trees are the most capable of discriminating new cases compared to trees trained by other decision tree codes including the rpart, C50, party and tree packages in R. Compared to other optimal decision tree packages, including DL8.5, OSDT, GOSDT and indirectly more, bsnsing stands out in its training speed, ease of use and broader applicability without losing in预测准确性。
This paper proposes a new mixed-integer programming (MIP) formulation to optimize split rule selection in the decision tree induction process, and develops an efficient search algorithm that is able to solve practical instances of the MIP model faster than commercial solvers. The formulation is novel for it directly maximizes the Gini reduction, an effective split selection criterion which has never been modeled in a mathematical program for its nonconvexity. The proposed approach differs from other optimal classification tree models in that it does not attempt to optimize the whole tree, therefore the flexibility of the recursive partitioning scheme is retained and the optimization model is more amenable. The approach is implemented in an open-source R package named bsnsing. Benchmarking experiments on 75 open data sets suggest that bsnsing trees are the most capable of discriminating new cases compared to trees trained by other decision tree codes including the rpart, C50, party and tree packages in R. Compared to other optimal decision tree packages, including DL8.5, OSDT, GOSDT and indirectly more, bsnsing stands out in its training speed, ease of use and broader applicability without losing in prediction accuracy.