论文标题
软件语义克隆检测的合奏学习方法
An ensemble learning approach for software semantic clone detection
论文作者
论文摘要
代码克隆在软件中是一个严重的问题,并且有可能遇到软件缺陷,维护开销和许可违规行为。因此,克隆检测对于减少维护工作和改善软件演化过程中的代码质量很重要。已经提出了各种克隆检测技术来识别软件中的类似代码。但是,很少有人能有效地检测语义克隆(在功能上相似的代码而没有任何句法相似之处)。最近,提出了一些基于深度学习的克隆探测器来检测语义克隆。但是,这些方法在数据标记和模型培训中具有很高的成本。在本文中,我们提出了一种新颖的方法,该方法利用单词嵌入和集合学习技术来检测语义克隆。我们对常用克隆基准BigClonebench的评估表明,与基于令牌的克隆探测器,Sourcerercc和另一个基于深度学习的克隆检测器CDLH相比,我们的方法显着提高了语义克隆检测的精度和回忆。
Code clone is a serious problem in software and has the potential to software defects, maintenance overhead, and licensing violations. Therefore, clone detection is important for reducing maintenance effort and improving code quality during software evolution. A variety of clone detection techniques have been proposed to identify similar code in software. However, few of them can efficiently detect semantic clones (functionally similar code without any syntactic resemblance). Recently, several deep learning based clone detectors are proposed to detect semantic clones. However, these approaches have high cost in data labelling and model training. In this paper, we propose a novel approach that leverages word embedding and ensemble learning techniques to detect semantic clones. Our evaluation on a commonly used clone benchmark, BigCloneBench, shows that our approach significantly improves the precision and recall of semantic clone detection, in comparison to a token-based clone detector, SourcererCC, and another deep learning based clone detector, CDLH.