论文标题

REACC:检索授权的代码完成框架

ReACC: A Retrieval-Augmented Code Completion Framework

论文作者

Lu, Shuai, Duan, Nan, Han, Hojae, Guo, Daya, Hwang, Seung-won, Svyatkovskiy, Alexey

论文摘要

旨在根据代码上下文预测以下代码令牌的代码完成,可以提高软件开发的生产率。最近的工作证明,使用变压器的统计语言建模可以通过从大规模源代码数据集中学习来大大提高代码完成任务中的性能。但是,当前方法仅关注文件或项目中的代码上下文,即内部上下文。我们的区别是利用“外部”上下文,灵感来自于编写代码时从相关代码段复制的人类行为。具体而言,我们提出了一个检索授权的代码完成框架,利用词汇复制和指代代码,并通过检索类似的语义来指导代码。我们采用舞台培训方法,将源代码检索器和用于编程语言的自动回归语言模型结合在一起。我们在Python和Java编程语言中评估了代码完成任务中的方法,并在Codexglue基准测试中实现了最先进的性能。

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language modeling with transformers can greatly improve the performance in the code completion task via learning from large-scale source code datasets. However, current approaches focus only on code context within the file or project, i.e. internal context. Our distinction is utilizing "external" context, inspired by human behaviors of copying from the related code snippets when writing code. Specifically, we propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We adopt a stage-wise training approach that combines a source code retriever and an auto-regressive language model for programming language. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源