论文标题
对文本分类的解释性技术的诊断研究
A Diagnostic Study of Explainability Techniques for Text Classification
论文作者
论文摘要
机器学习的最新发展引入了模型,以增加建筑复杂性的成本来实现人类绩效。使模型预测背后的理由的努力启发了大量的新解释能力。他们提供了已经训练有素的模型,它们为输入实例的单词计算显着性得分。但是,没有关于(i)如何在特定的应用程序任务和模型体系结构的情况下选择这样的技术的确定指南,以及(ii)使用每种此类技术的好处和缺点。在本文中,我们开发了一份详尽的诊断属性列表,用于评估现有的解释性技术。然后,我们使用拟议的列表来比较下游文本分类任务和神经网络体系结构的一组不同的解释性技术。我们还将解释性技术分配的显着性得分与显着输入区域的人类注释进行了比较,以找到模型的绩效与其与人类的理由的一致性之间的关系。总体而言,我们发现基于梯度的解释在跨任务和模型体系结构方面发挥最佳性,并且我们对审查的解释性技术的属性进行了进一步的见解。
Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity. Efforts to make the rationales behind the models' predictions transparent have inspired an abundance of new explainability techniques. Provided with an already trained model, they compute saliency scores for the words of an input instance. However, there exists no definitive guide on (i) how to choose such a technique given a particular application task and model architecture, and (ii) the benefits and drawbacks of using each such technique. In this paper, we develop a comprehensive list of diagnostic properties for evaluating existing explainability techniques. We then employ the proposed list to compare a set of diverse explainability techniques on downstream text classification tasks and neural network architectures. We also compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones. Overall, we find that the gradient-based explanations perform best across tasks and model architectures, and we present further insights into the properties of the reviewed explainability techniques.