源代码摘要的提取和提取框架

论文标题

源代码摘要的提取和提取框架

An Extractive-and-Abstractive Framework for Source Code Summarization

论文作者

Sun, Weisong, Fang, Chunrong, Chen, Yuchen, Zhang, Quanjun, Tao, Guanhong, Han, Tingxu, Ge, Yifei, You, Yudu, Luo, Bin

论文摘要

（源）代码摘要旨在以自然语言的形式自动为给定代码段生成摘要/注释。此类摘要在帮助开发人员理解和维护源代码方面起着关键作用。现有的代码摘要技术可以分类为提取方法和抽象方法。提取方法使用检索技术从代码片段中提取重要语句和关键字的子集，并生成一个摘要，该摘要保留了重要语句和关键字中的事实详细信息。但是，这样的子集可能会错过标识符或实体命名，因此，产生的摘要的自然性通常很差。抽象方法可以从神经机器翻译域中利用编码器模型产生类似人写的摘要。但是，生成的摘要通常会错过重要的事实细节。为了通过保留的事实细节生成人写的式摘要，我们提出了一个新颖的提取和吸收框架。框架中的提取模块执行了提取代码摘要的任务，该任务列入了代码段，并预测包含关键事实细节的重要陈述。框架中的抽象模块执行了抽象代码摘要的任务，该任务在整个代码段和并行的重要陈述中进行，并生成了简洁而人工写的类似的自然语言摘要。我们通过在涉及六种编程语言的三个数据集上进行大量实验来评估称为EACS的有效性。实验结果表明，在所有三种广泛使用的指标（包括BLEU，流星和Rough-l）方面，EACS明显优于最先进的技术。

(Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization techniques can be categorized into extractive methods and abstractive methods. The extractive methods extract a subset of important statements and keywords from the code snippet using retrieval techniques, and generate a summary that preserves factual details in important statements and keywords. However, such a subset may miss identifier or entity naming, and consequently, the naturalness of generated summary is usually poor. The abstractive methods can generate human-written-like summaries leveraging encoder-decoder models from the neural machine translation domain. The generated summaries however often miss important factual details. To generate human-written-like summaries with preserved factual details, we propose a novel extractive-and-abstractive framework. The extractive module in the framework performs a task of extractive code summarization, which takes in the code snippet and predicts important statements containing key factual details. The abstractive module in the framework performs a task of abstractive code summarization, which takes in the entire code snippet and important statements in parallel and generates a succinct and human-written-like natural language summary. We evaluate the effectiveness of our technique, called EACS, by conducting extensive experiments on three datasets involving six programming languages. Experimental results show that EACS significantly outperforms state-of-the-art techniques in terms of all three widely used metrics, including BLEU, METEOR, and ROUGH-L.

下载PDF全文

下载文献需遵守相关版权规定

论文标题