论文标题

在域转移下估计模型性能以特定于类的置信度得分

Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores

论文作者

Li, Zeju, Kamnitsas, Konstantinos, Islam, Mobarakol, Chen, Chen, Glocker, Ben

论文摘要

机器学习模型通常部署在与训练设置不同的测试环境中,可能会导致由于域移动而导致模型性能下降。如果我们可以估计预先训练的模型将在特定部署设置(例如某个诊所)上实现的性能,我们可以判断该模型是否可以安全部署,或者其性能是否在特定数据上降低了。现有方法基于对部署域中未标记的测试数据的预测信心进行估算。我们发现现有的方法与目前类不平衡的数据困难,因为用于校准置信度的方法并不能解决阶级不平衡引起的偏见,因此未能估算阶级的准确性。在这里,我们在不平衡数据集的性能估计框架内介绍了班级校准。具体而言,我们得出了基于最新置信的模型评估方法(包括温度缩放(TS),信心差异(DOC)和平均阈值置信度(A​​TC))的最新置信度评估方法的特定于类的修改。我们还将方法扩展到图像分割中的骰子相似性系数(DSC)。我们对四个任务进行实验,并发现所提出的修改一致提高了数据集的估计精度。与先前方法相比,我们的方法在自然结构域移动下的分类中提高了准确性估计,并在自然域移动下的分类中提高了估计,并将分割任务的估计精度增加一倍。

Machine learning models are typically deployed in a test setting that differs from the training setting, potentially leading to decreased model performance because of domain shift. If we could estimate the performance that a pre-trained model would achieve on data from a specific deployment setting, for example a certain clinic, we could judge whether the model could safely be deployed or if its performance degrades unacceptably on the specific data. Existing approaches estimate this based on the confidence of predictions made on unlabeled test data from the deployment's domain. We find existing methods struggle with data that present class imbalance, because the methods used to calibrate confidence do not account for bias induced by class imbalance, consequently failing to estimate class-wise accuracy. Here, we introduce class-wise calibration within the framework of performance estimation for imbalanced datasets. Specifically, we derive class-specific modifications of state-of-the-art confidence-based model evaluation methods including temperature scaling (TS), difference of confidences (DoC), and average thresholded confidence (ATC). We also extend the methods to estimate Dice similarity coefficient (DSC) in image segmentation. We conduct experiments on four tasks and find the proposed modifications consistently improve the estimation accuracy for imbalanced datasets. Our methods improve accuracy estimation by 18\% in classification under natural domain shifts, and double the estimation accuracy on segmentation tasks, when compared with prior methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源