论文标题
没有不可能的数据的重要性
Variable importance without impossible data
论文作者
论文摘要
测量黑匣子预测算法中变量重要性的最流行方法是使用合成输入,这些输入结合了来自多个受试者的预测变量。这些输入可能是不可能的,身体上不可能的,甚至在逻辑上是不可能的。结果,对此类情况的预测可以基于数据,这与对黑匣子的训练非常不同。我们认为,当解释使用此类值时,用户无法相信预测算法的决定的解释。取而代之的是,我们主张一种称为同类沙普利的方法,该方法基于经济游戏理论,与大多数其他游戏理论方法不同,它仅使用实际观察到的数据来量化可变重要性。莎普利队的同伙通过缩小判断的主题的范围,被认为与一个或多个功能上的目标主题相似。我们在算法公平问题上进行了说明,其中必须将重要性归因于未经训练的受保护变量的重要性。
The most popular methods for measuring importance of the variables in a black box prediction algorithm make use of synthetic inputs that combine predictor variables from multiple subjects. These inputs can be unlikely, physically impossible, or even logically impossible. As a result, the predictions for such cases can be based on data very unlike any the black box was trained on. We think that users cannot trust an explanation of the decision of a prediction algorithm when the explanation uses such values. Instead we advocate a method called Cohort Shapley that is grounded in economic game theory and unlike most other game theoretic methods, it uses only actually observed data to quantify variable importance. Cohort Shapley works by narrowing the cohort of subjects judged to be similar to a target subject on one or more features. We illustrate it on an algorithmic fairness problem where it is essential to attribute importance to protected variables that the model was not trained on.