论文标题

探索软件开发过程中堆栈溢出的代码重复使用

Towards Exploring the Code Reuse from Stack Overflow during Software Development

论文作者

Huang, Yuan, Xu, Furen, Zhou, Haojie, Chen, Xiangping, Zhou, Xiaocong, Wang, Tong

论文摘要

作为最著名的程序员问答网站之一,Stack Overflow(即So)每天都在为成千上万的开发人员提供服务。先前的工作表明,许多开发人员在上面重复使用代码片段,因此当他们找到答案时,与他们在开发活动中遇到的编程问题相匹配时。为了研究程序员在项目开发过程中如何重用代码,我们进行了一项全面的实证研究。首先,为了捕获程序员的开发活动,我们在793个开源Java项目的提交中收集了342,148个修改的代码段,这些修改后的代码可以反映开发过程中遇到的编程问题。我们还从So收集了1,355,617个帖子中的代码片段。然后,我们采用CCFinder来检测从提交的修改代码和SO中的代码之间的代码克隆,并在程序员在开发过程中解决编程问题时进一步分析代码重复使用。我们计算每个项目在不同年份的提交中修改的代码段的代码重用比率,结果表明,平均代码重用比率为6.32%,最大代码重用比率为8.38%。项目提交中的代码重用比率逐年增加,而新成立的项目中代码重用的比例高于旧项目。我们还发现,一些项目在许多年前重复了代码片段。此外,我们发现经验丰富的开发人员似乎更有可能重复使用知识。此外,我们发现与错误相关的提交中的代码重用比率(6.67%)略高于与非BUG相关的提交(6.59%)中的代码。此外,我们还发现,经过多次修改的Java类文件中的代码重用比率(14.44%)是整体代码重用比率的两倍以上(6.32%)。

As one of the most well-known programmer Q&A websites, Stack Overflow (i.e., SO) is serving tens of thousands of developers every day. Previous work has shown that many developers reuse the code snippets on SO when they find an answer (from SO) that functionally matches the programming problem they encounter in their development activities. To study how programmers reuse code on SO during project development, we conduct a comprehensive empirical study. First, to capture the development activities of programmers, we collect 342,148 modified code snippets in commits from 793 open-source Java projects, and these modified code can reflect the programming problems encountered during development. We also collect the code snippets from 1,355,617 posts on SO. Then, we employ CCFinder to detect the code clone between the modified code from commits and the code from SO, and further analyze the code reuse when programmer solves a programming problem during development. We count the code reuse ratios of the modified code snippets in the commits of each project in different years, the results show that the average code reuse ratio is 6.32%, and the maximum is 8.38%. The code reuse ratio in project commits has increased year by year, and the proportion of code reuse in the newly established project is higher than that of old projects. We also find that some projects reuse the code snippets from many years ago. Additionally, we find that experienced developers seem to be more likely to reuse the knowledge on SO. Moreover, we find that the code reuse ratio in bug-related commits (6.67%) is slightly higher than that of in non-bug-related commits (6.59%). Furthermore, we also find that the code reuse ratio (14.44%) in Java class files that have undergone multiple modifications is more than double the overall code reuse ratio (6.32%).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源