论文标题
自动识别来自四个不同来源的自我吸引的技术债务
Automatic Identification of Self-Admitted Technical Debt from Four Different Sources
论文作者
论文摘要
技术债务是指采取快捷方式以实现短期目标,同时牺牲软件系统的长期可维护性和可发展性。开发商本身明确报告了大部分技术债务;这通常被称为自我吸引的技术债务或SATD。以前的工作重点是识别源代码注释和发行跟踪器的SATD。但是,没有可用的方法可以自动从其他来源(例如提交消息和拉请请求)或通过组合多个来源来识别SATD。因此,我们提出并评估一种集成了四个资源的自动化SATD标识的方法:源代码注释,提交消息,提取请求和发出跟踪系统。我们的发现表明,我们的方法在检测四种SATD(即,来自上述四个来源的SATD(即代码/设计债务,要求债务和测试债务)时,我们的方法都超过了基线方法,并实现了0.611的平均F1得分。此后,我们分析了236万代码评论,130万提示消息,370万发行部分和170万拉的请求部分,以表征103个开源项目中的SATD。此外,我们研究了不同来源中SATD之间的SATD关键字和关系。研究结果表明:1)SATD在所有来源之间均匀散布; 2)问题和拉的请求是有关共享SATD关键字数量的两个最相似的来源,其次是提交消息,然后是代码注释; 3)不同来源中的SATD项目之间有四种关系。
Technical debt refers to taking shortcuts to achieve short-term goals while sacrificing the long-term maintainability and evolvability of software systems. A large part of technical debt is explicitly reported by the developers themselves; this is commonly referred to as Self-Admitted Technical Debt or SATD. Previous work has focused on identifying SATD from source code comments and issue trackers. However, there are no approaches available for automatically identifying SATD from other sources such as commit messages and pull requests, or by combining multiple sources. Therefore, we propose and evaluate an approach for automated SATD identification that integrates four sources: source code comments, commit messages, pull requests, and issue tracking systems. Our findings show that our approach outperforms baseline approaches and achieves an average F1-score of 0.611 when detecting four types of SATD (i.e., code/design debt, requirement debt, documentation debt, and test debt) from the four aforementioned sources. Thereafter, we analyze 23.6M code comments, 1.3M commit messages, 3.7M issue sections, and 1.7M pull request sections to characterize SATD in 103 open-source projects. Furthermore, we investigate the SATD keywords and relations between SATD in different sources. The findings indicate, among others, that: 1) SATD is evenly spread among all sources; 2) issues and pull requests are the two most similar sources regarding the number of shared SATD keywords, followed by commit messages, and then followed by code comments; 3) there are four kinds of relations between SATD items in the different sources.