校准选择性分类

论文标题

校准选择性分类

Calibrated Selective Classification

论文作者

Fisch, Adam, Jaakkola, Tommi, Barzilay, Regina

论文摘要

为了获得更好的有效准确性，选择性分类允许模型放弃预测（例如，说“我不知道”）。尽管典型的选择性模型平均可以有效地产生更准确的预测，但它们仍可能允许具有很高置信度的错误预测，或者跳过置信度较低的正确预测。提供校准的不确定性估计以及预测（与真实频率相对应的概率）同样重要，就像具有平均准确的预测一样重要。但是，不确定性估计对于某些输入可能不可靠。在本文中，我们开发了一种新方法来选择性分类，其中我们提出了一种拒绝“不确定”不确定性的示例的方法。通过这样做，我们旨在通过{良好校准}的不确定性估计对所接受示例的分布进行预测，这是我们称为选择性校准的属性。我们提出了一个选择性校准模型的框架，其中训练了单独的选择器网络以改善给定基本模型的选择性校准误差。特别是，我们的工作着重于实现强大的校准，在该校准中，该模型被故意设计为在室外数据上进行测试。我们通过受分配强大的优化启发的训练策略实现了这一目标，在该策略中，我们将模拟输入扰动应用于已知的，内域培训数据。我们证明了方法对多个图像分类和肺癌风险评估任务的经验有效性。

Selective classification allows models to abstain from making predictions (e.g., say "I don't know") when in doubt in order to obtain better effective accuracy. While typical selective models can be effective at producing more accurate predictions on average, they may still allow for wrong predictions that have high confidence, or skip correct predictions that have low confidence. Providing calibrated uncertainty estimates alongside predictions -- probabilities that correspond to true frequencies -- can be as important as having predictions that are simply accurate on average. However, uncertainty estimates can be unreliable for certain inputs. In this paper, we develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties. By doing so, we aim to make predictions with {well-calibrated} uncertainty estimates over the distribution of accepted examples, a property we call selective calibration. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. In particular, our work focuses on achieving robust calibration, where the model is intentionally designed to be tested on out-of-domain data. We achieve this through a training strategy inspired by distributionally robust optimization, in which we apply simulated input perturbations to the known, in-domain training data. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题