通过对模型信心的反事实解释来改善模型理解和信任

论文标题

通过对模型信心的反事实解释来改善模型理解和信任

Improving Model Understanding and Trust with Counterfactual Explanations of Model Confidence

论文作者

Le, Thao, Miller, Tim, Singh, Ronal, Sonenberg, Liz

论文摘要

在本文中，我们表明，对信心评分的反事实解释有助于用户更好地理解和更好地相信AI模型在人类受试者研究中的预测。在人类代理人的交互系统中显示信心得分可以帮助建立人类和人工智能系统之间的信任。但是，大多数现有的研究仅将置信度得分作为一种交流形式，我们仍然缺乏解释算法有信心的方法。本文还提出了两种方法，以使用反事实解释来理解模型置信度：（1）基于反事实示例；（2）基于反事实空间的可视化。

In this paper, we show that counterfactual explanations of confidence scores help users better understand and better trust an AI model's prediction in human-subject studies. Showing confidence scores in human-agent interaction systems can help build trust between humans and AI systems. However, most existing research only used the confidence score as a form of communication, and we still lack ways to explain why the algorithm is confident. This paper also presents two methods for understanding model confidence using counterfactual explanation: (1) based on counterfactual examples; and (2) based on visualisation of the counterfactual space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题