论文标题
十年的代码评论质量评估:系统文献评论
A Decade of Code Comment Quality Assessment: A Systematic Literature Review
论文作者
论文摘要
代码注释是软件系统中的重要工件,并且在许多软件工程(SE)任务中起着至关重要的作用。但是,尽管广泛接受的是,高质量在代码评论中与源代码有关,但在源代码中至关重要,但在实践中评估评论质量仍然是一个开放的问题。首先,在评估代码注释时,质量没有唯一的定义。关于该主题的少数现有研究侧重于很容易量化和测量的特定质量属性。现有的技术和相应的工具也可以集中在与特定编程语言绑定的注释上,并且只能处理具有特定范围和明确目标的注释(例如,在方法级别上的Javadoc评论,或描述要解决的毒品的体内注释)。在本文中,我们介绍了SE研究的最后十年的系统文献综述(SLR),以回答以下研究问题:(i)研究人员在评估评论质量时关注哪些类型的评论? (ii)他们考虑哪些质量属性(QA)? (iii)他们使用哪些工具和技术来评估评论质量?以及(iv)他们如何评估评论质量评估的研究?我们的评估基于对2353篇论文的分析和对47个相关论文的实际评论,表明(i)大多数研究和技术都集中在Java代码中的评论上,因此可能无法推广到其他语言中,以及(ii)分析的研究集中于四个主要的QA,在文献中确定的21个QA的四个主要QA,并在文献中确定了清晰的评论和一致性的一致性,并且是一致性的一致性和一致性的一致性,并且是一致性的,并且是一致性的。我们观察到,研究人员依靠手动评估和特定的启发式方法,而不是对评论质量属性的自动评估。
Code comments are important artifacts in software systems and play a paramount role in many software engineering (SE) tasks related to maintenance and program comprehension. However, while it is widely accepted that high quality matters in code comments just as it matters in source code, assessing comment quality in practice is still an open problem. First and foremost, there is no unique definition of quality when it comes to evaluating code comments. The few existing studies on this topic rather focus on specific attributes of quality that can be easily quantified and measured. Existing techniques and corresponding tools may also focus on comments bound to a specific programming language, and may only deal with comments with specific scopes and clear goals (e.g., Javadoc comments at the method level, or in-body comments describing TODOs to be addressed). In this paper, we present a Systematic Literature Review (SLR) of the last decade of research in SE to answer the following research questions: (i) What types of comments do researchers focus on when assessing comment quality? (ii) What quality attributes (QAs) do they consider? (iii) Which tools and techniques do they use to assess comment quality?, and (iv) How do they evaluate their studies on comment quality assessment in general? Our evaluation, based on the analysis of 2353 papers and the actual review of 47 relevant ones, shows that (i) most studies and techniques focus on comments in Java code, thus may not be generalizable to other languages, and (ii) the analyzed studies focus on four main QAs of a total of 21 QAs identified in the literature, with a clear predominance of checking consistency between comments and the code. We observe that researchers rely on manual assessment and specific heuristics rather than the automated assessment of the comment quality attributes.