论文标题
SVM的原始估计的亚级别求解器用于分类不平衡
Primal Estimated Subgradient Solver for SVM for Imbalanced Classification
论文作者
论文摘要
我们的目的是在实验中证明,我们的成本敏感PEGASOS SVM在不平衡的数据集上取得了良好的性能,大多数与少数比率在8.6:1至130:1之间,并确定包括截距(偏见),正则化和参数是否会影响我们选择数据集的性能。尽管许多人诉诸于示意方法,但我们的目标是采用较少的计算密集型方法。我们通过检查学习曲线来评估性能。这些曲线诊断我们是否合适或不合适,或者在此过程中选择的数据的随机样本是否不够随机或不够多样化,以适合算法的因变量类中,以概括为了概括地看不见的示例。我们还将在验证曲线中看到超参数的背景与测试和火车误差的背景。我们基于PEGASOS成本敏感的SVM的丁线性SVM DecIDL方法的结果。他在一个数据集中获得了.5的ROC-AUC。我们的工作将通过将内核纳入SVM来扩展Ding的工作。我们将使用Python而不是MATLAB,因为Python具有在多参数交叉验证期间存储混合数据类型的字典。
We aim to demonstrate in experiments that our cost sensitive PEGASOS SVM achieves good performance on imbalanced data sets with a Majority to Minority Ratio ranging from 8.6:1 to 130:1 and to ascertain whether the including intercept (bias), regularization and parameters affects performance on our selection of datasets. Although many resort to SMOTE methods, we aim for a less computationally intensive method. We evaluate the performance by examining the learning curves. These curves diagnose whether we overfit or underfit or whether the random sample of data chosen during the process was not random enough or diverse enough in dependent variable class for the algorithm to generalized to unseen examples. We will also see the background of the hyperparameters versus the test and train error in validation curves. We benchmark our PEGASOS Cost-Sensitive SVM's results of Ding's LINEAR SVM DECIDL method. He obtained an ROC-AUC of .5 in one dataset. Our work will extend the work of Ding by incorporating kernels into SVM. We will use Python rather than MATLAB as python has dictionaries for storing mixed data types during multi-parameter cross-validation.