论文标题
想要什么程序:自动推断输入数据规格
What Programs Want: Automatic Inference of Input Data Specifications
论文作者
论文摘要
如今,随着机器学习的软件迅速渗透到我们的社会中,我们越来越容易受到数据预处理或培训软件中的编程错误以及数据本身的错误。在本文中,我们为数据处理程序的输入数据提出了一个静态形状分析框架。我们的分析会自动消除数据处理程序读取的数据的结构和值的必要条件。我们的框架建立在一个基本的抽象域家族的基础上,并间接地扩展到有关输入数据的理由,而不仅仅是简单地推论程序变量。这些抽象领域的选择是分析的参数。我们描述了从现有抽象域构建的各种实例。提出的方法是在Python程序的开源静态分析仪中实施的。我们在许多代表性的例子上证明了它的潜力。
Nowadays, as machine-learned software quickly permeates our society, we are becoming increasingly vulnerable to programming errors in the data pre-processing or training software, as well as errors in the data itself. In this paper, we propose a static shape analysis framework for input data of data-processing programs. Our analysis automatically infers necessary conditions on the structure and values of the data read by a data-processing program. Our framework builds on a family of underlying abstract domains, extended to indirectly reason about the input data rather than simply reasoning about the program variables. The choice of these abstract domain is a parameter of the analysis. We describe various instances built from existing abstract domains. The proposed approach is implemented in an open-source static analyzer for Python programs. We demonstrate its potential on a number of representative examples.