论文标题
用于编程语言处理的多视图图表:对算法检测的调查
Multi-View Graph Representation for Programming Language Processing: An Investigation into Algorithm Detection
论文作者
论文摘要
计划表示旨在将程序源代码转换为具有自动提取功能的向量,这是编程语言处理(PLP)的基本问题。最近的工作试图根据源代码结构来代表神经网络的程序。但是,这种方法通常集中在语法上,仅考虑一个单一的程序观点,从而限制了模型的表示能力。本文提出了一个多视图图(MVG)程序表示方法。 MVG对代码语义的更多关注,同时将数据流和控制流程作为多个视图。然后将这些视图组合在一起,并通过图神经网络(GNN)进行合并和处理,以获得涵盖各个方面的综合程序表示。我们在算法检测的背景下彻底评估了我们提出的MVG方法,这是PLP的重要且具有挑战性的子场。具体来说,我们使用公共数据集POJ-104,还构建了一个新的具有挑战性的数据集ALG-109来测试我们的方法。在实验中,MVG胜过以前的方法,证明了我们的模型代表源代码的强大能力。
Program representation, which aims at converting program source code into vectors with automatically extracted features, is a fundamental problem in programming language processing (PLP). Recent work tries to represent programs with neural networks based on source code structures. However, such methods often focus on the syntax and consider only one single perspective of programs, limiting the representation power of models. This paper proposes a multi-view graph (MVG) program representation method. MVG pays more attention to code semantics and simultaneously includes both data flow and control flow as multiple views. These views are then combined and processed by a graph neural network (GNN) to obtain a comprehensive program representation that covers various aspects. We thoroughly evaluate our proposed MVG approach in the context of algorithm detection, an important and challenging subfield of PLP. Specifically, we use a public dataset POJ-104 and also construct a new challenging dataset ALG-109 to test our method. In experiments, MVG outperforms previous methods significantly, demonstrating our model's strong capability of representing source code.