论文标题
关于在不同的缺陷预测方案中一级支持向量机的有效性
On The Effectiveness of One-Class Support Vector Machine in Different Defect Prediction Scenarios
论文作者
论文摘要
缺陷预测旨在识别在最终用户提供软件之前可能会导致故障的软件组件。迄今为止,此任务已被建模为两类分类问题,但是它的性质还允许将其作为一级分类任务进行配合。先前的研究表明,单级支持向量机(OCSVM)可以胜过项目内部缺陷预测的两级分类器,但是在较细的粒度(即提交级别的缺陷预测)时,它在使用时无效。在本文中,我们进一步研究了一个类别是否仅足以在其他两种不同的情况(即粒度),即交叉跨性别和交叉项目缺陷预测模型中产生有效的缺陷预测模型,并在项目内部粒度内重复以前的工作。我们的经验结果证实,在不同的粒度水平下,OCSVM的性能保持较低,也就是说,对于交叉介绍和横向项目缺陷预测,两类随机森林(RF)分类器的表现都优于两级随机森林(RF)分类器。虽然我们无法得出结论,OCSVM是最好的分类器,但我们的结果仍然显示出有趣的发现。虽然OCSVM的表现不超过RF,但它的性能仍然优于其两级对应物(即SVM)以及本文研究的其他两级分类器。我们还观察到OCSVM更适合横向和横向对象缺陷预测,而不是对项目内部缺陷预测,因此表明它在异质数据中的性能更好。我们鼓励对单级分类器进行进一步研究以进行缺陷预测,因为当有关有缺陷模块的数据稀缺或不可用时,这些技术可能是一种替代方法。
Defect prediction aims at identifying software components that are likely to cause faults before a software is made available to the end-user. To date, this task has been modeled as a two-class classification problem, however its nature also allows it to be formulated as a one-class classification task. Previous studies show that One-Class Support Vector Machine (OCSVM) can outperform two-class classifiers for within-project defect prediction, however it is not effective when employed at a finer granularity (i.e., commit-level defect prediction). In this paper, we further investigate whether learning from one class only is sufficient to produce effective defect prediction model in two other different scenarios (i.e., granularity), namely cross-version and cross-project defect prediction models, as well as replicate the previous work at within-project granularity for completeness. Our empirical results confirm that OCSVM performance remain low at different granularity levels, that is, it is outperformed by the two-class Random Forest (RF) classifier for both cross-version and cross-project defect prediction. While, we cannot conclude that OCSVM is the best classifier, our results still show interesting findings. While OCSVM does not outperform RF, it still achieves performance superior to its two-class counterpart (i.e., SVM) as well as other two-class classifiers studied herein. We also observe that OCSVM is more suitable for both cross-version and cross-project defect prediction, rather than for within-project defect prediction, thus suggesting it performs better with heterogeneous data. We encourage further research on one-class classifiers for defect prediction as these techniques may serve as an alternative when data about defective modules is scarce or not available.