论文标题
多编码器:低资源代码完成的多编程 - 语言预培训
MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion
论文作者
论文摘要
在学术界和行业中,代码完成是一个宝贵的话题。最近,已经提出了大规模的单程语言语言(单语语言)(Monopl)训练模型来提高代码完成的性能。但是,对于数据驱动的范式而言,很难在低资源编程语言(PL)上完成代码完成,而有很多开发人员使用低资源PLS。另一方面,很少有研究探讨多编程 - 语言(乘)预训练对代码完成的影响,尤其是对低资源编程语言的影响。为此,我们提出了多层编码器,以通过乘以预训练和乘以型混合物(MOE)层来增强低资源代码的完成。我们进一步提出了一种新颖的PL级MUE路由策略(PL-MOE),以改善所有PLS的代码完成。 Codexglue和Multicc的实验结果表明,1)拟议的多形编码器在低资源编程语言上明显优于单opl基准,以及2)PL-MOE模块进一步提高了六种编程语言的性能。此外,我们在详细信息中分析了所提出的方法的效果,并在各种情况下探讨了我们方法的有效性。
Code completion is a valuable topic in both academia and industry. Recently, large-scale mono-programming-lingual (MonoPL) pre-training models have been proposed to boost the performance of code completion. However, the code completion on low-resource programming languages (PL) is difficult for the data-driven paradigm, while there are plenty of developers using low-resource PLs. On the other hand, there are few studies exploring the effects of multi-programming-lingual (MultiPL) pre-training for the code completion, especially the impact on low-resource programming languages. To this end, we propose the MultiCoder to enhance the low-resource code completion via MultiPL pre-training and MultiPL Mixture-of-Experts (MoE) layers. We further propose a novel PL-level MoE routing strategy (PL-MoE) for improving the code completion on all PLs. Experimental results on CodeXGLUE and MultiCC demonstrate that 1) the proposed MultiCoder significantly outperforms the MonoPL baselines on low-resource programming languages, and 2) the PL-MoE module further boosts the performance on six programming languages. In addition, we analyze the effects of the proposed method in details and explore the effectiveness of our method in a variety of scenarios.