学习编码和分类测试执行

论文标题

学习编码和分类测试执行

Learning to Encode and Classify Test Executions

论文作者

Tsimpourlas, Foivos, Rajan, Ajitha, Allamanis, Miltiadis

论文摘要

自动确定测试执行的正确性的挑战称为测试甲骨文问题，是自动测试的剩余关键问题之一。本文的目标是以一般，可扩展和准确的方式解决测试甲骨文问题。为了实现这一目标，我们使用监督的学习对测试执行跟踪。我们将执行跟踪的一小部分标记为通过或失败的判决。我们使用标记的迹线来训练神经网络（NN）模型，以学会区分给定程序的运行时模式与失败的执行。我们构建此NN模型的方法涉及以下步骤，1。仪器记录执行跟踪的程序作为方法调节序列和全球状态的序列，标记一小部分执行跟踪符号的验证符号，其判决，3。设计一个NN组件，设计将信息嵌入到执行范围的固定范围中的NN组件，用于固定范围，以进行固定的nn模型，以设计nn。从程序。我们使用来自不同应用领域的案例研究来评估我们的方法：1。来自以太坊区块链的模块，2。来自Pytorch深度学习框架的模块，3。MicrosoftSeal Seal Encryption库组件，4。SEDStream Editor，5。价值指针和6。Linux数据包标识符的nine Network Stocalde from 6。我们发现所有主题程序的分类模型都导致高精度，召回和特异性超过95％，而只有平均9％的总痕迹训练。我们的实验表明，所提出的神经网络模型作为测试Oracle非常有效，并且能够学习运行时模式，以区分系统和测试与不同应用程序域的测试执行和失败。

The challenge of automatically determining the correctness of test executions is referred to as the test oracle problem and is one of the key remaining issues for automated testing. The goal in this paper is to solve the test oracle problem in a way that is general, scalable and accurate. To achieve this, we use supervised learning over test execution traces. We label a small fraction of the execution traces with their verdict of pass or fail. We use the labelled traces to train a neural network (NN) model to learn to distinguish runtime patterns for passing versus failing executions for a given program. Our approach for building this NN model involves the following steps, 1. Instrument the program to record execution traces as sequences of method invocations and global state, 2. Label a small fraction of the execution traces with their verdicts, 3. Designing a NN component that embeds information in execution traces to fixed length vectors, 4. Design a NN model that uses the trace information for classification, 5. Evaluate the inferred classification model on unseen execution traces from the program. We evaluate our approach using case studies from different application domains: 1. Module from Ethereum Blockchain, 2. Module from PyTorch deep learning framework, 3. Microsoft SEAL encryption library components, 4. Sed stream editor, 5. Value pointer library and 6. Nine network protocols from Linux packet identifier, L7-Filter. We found the classification models for all subject programs resulted in high precision, recall and specificity, over 95%, while only training with an average 9% of the total traces. Our experiments show that the proposed neural network model is highly effective as a test oracle and is able to learn runtime patterns to distinguish passing and failing test executions for systems and tests from different application domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题