论文标题
深度学习推理引擎的基于图的模糊测试
Graph-Based Fuzz Testing for Deep Learning Inference Engine
论文作者
论文摘要
随着深度学习(DL)系统的广泛使用,学院和行业开始关注其质量。测试是质量保证的主要方法之一。但是,现有的测试技术集中在DL模型的质量上,但缺乏对核心基础推理引擎(即框架和库)的关注。受绒毛测试的成功案例的启发,我们设计了一种基于图的模糊测试方法,以提高DL推理引擎的质量。此方法自然之后是DL模型的图形结构。引入了基于图理论的新型操作员级别的覆盖范围标准,并通过探索模型结构,参数和数据输入的组合来实现六个不同的突变来生成多元化的DL模型。 Monte Carlo Tree搜索(MCT)用于无需训练过程即可驱动DL模型生成。实验结果表明,MCT在提高操作员级别的覆盖范围和检测异常方面优于随机方法。我们的方法在三种不希望的行为中发现了40多种不同的例外:模型转换失败,推理失败,输出比较失败。突变策略对于生成新的有效测试输入很有用,平均运营商级别的覆盖范围高达8.2%,捕获了8.6个例外。
With the wide use of Deep Learning (DL) systems, academy and industry begin to pay attention to their quality. Testing is one of the major methods of quality assurance. However, existing testing techniques focus on the quality of DL models but lacks attention to the core underlying inference engines (i.e., frameworks and libraries). Inspired by the success stories of fuzz testing, we design a graph-based fuzz testing method to improve the quality of DL inference engines. This method is naturally followed by the graph structure of DL models. A novel operator-level coverage criterion based on graph theory is introduced and six different mutations are implemented to generate diversified DL models by exploring combinations of model structures, parameters, and data inputs. The Monte Carlo Tree Search (MCTS) is used to drive DL model generation without a training process. The experimental results show that the MCTS outperforms the random method in boosting operator-level coverage and detecting exceptions. Our method has discovered more than 40 different exceptions in three types of undesired behaviors: model conversion failure, inference failure, output comparison failure. The mutation strategies are useful to generate new valid test inputs, by up to 8.2% more operator-level coverage on average and 8.6 more exceptions captured.