论文标题

OpenFe:具有专家级别性能的自动化功能生成

OpenFE: Automated Feature Generation with Expert-level Performance

论文作者

Zhang, Tianping, Zhang, Zheyu, Fan, Zhiyuan, Luo, Haoyan, Liu, Fengyuan, Liu, Qian, Cao, Wei, Li, Jian

论文摘要

自动化功能生成的目的是使机器学习专家从艰苦的手动功能生成任务中解放出来,这对于改善表格数据的学习性能至关重要。自动化功能生成的主要挑战是从大量候选功能中有效,准确地确定有效的功能。在本文中,我们展示了OpenFe,这是一种自动化功能生成工具,可针对机器学习专家提供竞争成果。 OpenFe通过两个组件实现了高效率和准确性:1)一种新型的功能增强方法,可准确评估候选特征的增量性能,以及2)一种两阶段的修剪算法,该算法以粗到精细的方式进行特征修剪。十个基准数据集的广泛实验表明,OpenFe的表现要大量优于现有的基线方法。我们在两次Kaggle比赛中进一步评估了OpenFe,其中成千上万的数据科学团队参加。在这两场比赛中,由OpenFe制作的简单基线模型生成的功能分别可以击败99.3%和99.6%的数据科学团队。除了经验结果外,我们还提供了理论观点,以表明在简单但代表性的环境中,特征产生可能是有益的。该代码可从https://github.com/zhangtp1996/openfe获得。

The goal of automated feature generation is to liberate machine learning experts from the laborious task of manual feature generation, which is crucial for improving the learning performance of tabular data. The major challenge in automated feature generation is to efficiently and accurately identify effective features from a vast pool of candidate features. In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts. OpenFE achieves high efficiency and accuracy with two components: 1) a novel feature boosting method for accurately evaluating the incremental performance of candidate features and 2) a two-stage pruning algorithm that performs feature pruning in a coarse-to-fine manner. Extensive experiments on ten benchmark datasets show that OpenFE outperforms existing baseline methods by a large margin. We further evaluate OpenFE in two Kaggle competitions with thousands of data science teams participating. In the two competitions, features generated by OpenFE with a simple baseline model can beat 99.3% and 99.6% data science teams respectively. In addition to the empirical results, we provide a theoretical perspective to show that feature generation can be beneficial in a simple yet representative setting. The code is available at https://github.com/ZhangTP1996/OpenFE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源