论文标题
XAI用于变形金刚:通过保守传播更好的解释
XAI for Transformers: Better Explanations through Conservative Propagation
论文作者
论文摘要
变压器已成为机器学习的重要主力,并具有许多应用。这需要开发可靠的方法来提高其透明度。已经提出了多种基于梯度信息的多种可解释性方法。我们表明,变压器中的梯度仅反映了本地函数,因此无法可靠地确定输入特征对预测的贡献。我们将注意力头和分层确定为这种不可靠的解释的主要原因,并提出了通过这些层传播的更稳定的方式。我们的建议在理论上和经验上都可以看作是对变压器的正确扩展,以克服简单的基于梯度的方法的缺乏,并在广泛的变压器模型和数据集上实现了最先进的解释性能。
Transformers have become an important workhorse of machine learning, with numerous applications. This necessitates the development of reliable methods for increasing their transparency. Multiple interpretability methods, often based on gradient information, have been proposed. We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. We identify Attention Heads and LayerNorm as main reasons for such unreliable explanations and propose a more stable way for propagation through these layers. Our proposal, which can be seen as a proper extension of the well-established LRP method to Transformers, is shown both theoretically and empirically to overcome the deficiency of a simple gradient-based approach, and achieves state-of-the-art explanation performance on a broad range of Transformer models and datasets.