通过阅读API文档的未知库代码生成

论文标题

通过阅读API文档的未知库代码生成

Code Generation for Unknown Libraries via Reading API Documentations

论文作者

Washio, Koki, Miyao, Yusuke

论文摘要

开放域代码生成是一个具有挑战性的问题，因为我们使用的一组功能和类在编程社区中经常更改和扩展。我们考虑了未知库的代码生成挑战，而无需额外的培训。在本文中，我们探讨了代码生成的框架，该框架可以指代相关的API文档（例如人类程序员处理未知库）。作为这个方向的第一步，我们实施了一个模型，该模型可以根据自然语言意图从API文档中提取相关代码签名，并从提取的签名中复制原语。此外，为了评估未知库和我们的框架的代码生成，我们扩展了开放域代码生成的现有数据集并启动它，以便评估数据仅由使用培训数据中未出现的库的示例组成。我们新拆分的实验表明，基线编码器模型无法按照预期的未知库来生成代码。相比之下，我们的模型表现优于新拆分的基线，当提取的代码签名毫无意义时，可以正确生成未知的原始图。

Open-domain code generation is a challenging problem because the set of functions and classes that we use are frequently changed and extended in programming communities. We consider the challenge of code generation for unknown libraries without additional training. In this paper, we explore a framework of code generation that can refer to relevant API documentations like human programmers to handle unknown libraries. As a first step of this direction, we implement a model that can extract relevant code signatures from API documentations based on a natural language intent and copy primitives from the extracted signatures. Moreover, to evaluate code generation for unknown libraries and our framework, we extend an existing dataset of open-domain code generation and resplit it so that the evaluation data consist of only examples using the libraries that do not appear in the training data. Experiments on our new split show that baseline encoder-decoder models cannot generate code using primitives of unknown libraries as expected. In contrast, our model outperforms the baseline on the new split and can properly generate unknown primitives when extracted code signatures are noiseless.

下载PDF全文

下载文献需遵守相关版权规定

论文标题