论文标题

语义相似性指标用于评估源代码摘要

Semantic Similarity Metrics for Evaluating Source Code Summarization

论文作者

Haque, Sakib, Eberhart, Zachary, Bansal, Aakash, McMillan, Collin

论文摘要

源代码摘要涉及创建自然语言中源代码的简要描述。这些描述是软件文档(例如Javadocs)的关键组成部分。自动代码摘要是软件工程研究的珍贵目标,因为对程序员的价值很高,并且同时撰写和维护文档的成本高。当前的工作几乎全部基于通过大数据输入训练的机器模型。该代码的示例和该代码摘要的大量数据集用于培训例如编码器 - 模型模型。然后根据一组参考摘要评估模型的输出预测。输入是模型看不到的代码,并且将预测与参考进行了比较。将预测与参考进行比较的手段本质上是单词重叠,通过诸如bleu或rouge之类的度量计算。使用单词重叠的问题在于,句子中的所有单词都具有相同的重要性,并且许多单词具有同义词。结果是,计算出的相似性可能与人类读者的相似性不符。在本文中,我们进行了一项实验,以测量各种单词重叠度量与预测和参考摘要的相似性相关的程度。我们根据目前的语义相似性指标评估替代方案,并提出建议评估源代码摘要的建议。

Source code summarization involves creating brief descriptions of source code in natural language. These descriptions are a key component of software documentation such as JavaDocs. Automatic code summarization is a prized target of software engineering research, due to the high value summaries have to programmers and the simultaneously high cost of writing and maintaining documentation by hand. Current work is almost all based on machine models trained via big data input. Large datasets of examples of code and summaries of that code are used to train an e.g. encoder-decoder neural model. Then the output predictions of the model are evaluated against a set of reference summaries. The input is code not seen by the model, and the prediction is compared to a reference. The means by which a prediction is compared to a reference is essentially word overlap, calculated via a metric such as BLEU or ROUGE. The problem with using word overlap is that not all words in a sentence have the same importance, and many words have synonyms. The result is that calculated similarity may not match the perceived similarity by human readers. In this paper, we conduct an experiment to measure the degree to which various word overlap metrics correlate to human-rated similarity of predicted and reference summaries. We evaluate alternatives based on current work in semantic similarity metrics and propose recommendations for evaluation of source code summarization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源