论文标题

Rashomon容量:用于分类的预测性多样性的度量

Rashomon Capacity: A Metric for Predictive Multiplicity in Classification

论文作者

Hsu, Hsiang, Calmon, Flavio du Pin

论文摘要

当具有统计上无法区分性能的分类模型为各个样本分配冲突的预测时,就会发生预测性多样性。当用于后果的应用(例如贷款,教育,刑事司法)中的决策时,不考虑预测性多样性的模型可能会导致对特定个人的不合理和任意决策。我们引入了一种称为Rashomon能力的新指标,以测量概率分类中的预测性多样性。预测多重性的先前指标集中于输出阈值(即0-1)预测类的分类器。相比之下,Rashomon的容量适用于概率分类器,从而捕获了单个样本的更细微的分数变化。我们为Rashomon的能力提供了严格的派生,提出了其直观的吸引力,并演示了如何在实践中估算它。我们表明,Rashomon的容量产生了向利益相关者披露矛盾模型的原则策略。我们的数值实验说明了Rashomon容量如何在包括神经网络在内的各种数据集和学习模型中捕获预测性多样性。本文介绍的工具可以帮助数据科学家在模型部署之前测量和报告预测性多样性。

Predictive multiplicity occurs when classification models with statistically indistinguishable performances assign conflicting predictions to individual samples. When used for decision-making in applications of consequence (e.g., lending, education, criminal justice), models developed without regard for predictive multiplicity may result in unjustified and arbitrary decisions for specific individuals. We introduce a new metric, called Rashomon Capacity, to measure predictive multiplicity in probabilistic classification. Prior metrics for predictive multiplicity focus on classifiers that output thresholded (i.e., 0-1) predicted classes. In contrast, Rashomon Capacity applies to probabilistic classifiers, capturing more nuanced score variations for individual samples. We provide a rigorous derivation for Rashomon Capacity, argue its intuitive appeal, and demonstrate how to estimate it in practice. We show that Rashomon Capacity yields principled strategies for disclosing conflicting models to stakeholders. Our numerical experiments illustrate how Rashomon Capacity captures predictive multiplicity in various datasets and learning models, including neural networks. The tools introduced in this paper can help data scientists measure and report predictive multiplicity prior to model deployment.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源