跨损失影响功能以解释深层网络表示

论文标题

跨损失影响功能以解释深层网络表示

Cross-Loss Influence Functions to Explain Deep Network Representations

论文作者

Silva, Andrew, Chopra, Rohit, Gombolay, Matthew

论文摘要

随着机器学习越来越多地部署在现实世界中，我们至关重要的是，我们开发了分析我们训练和部署到最终用户的模型的必要工具。最近，研究人员表明，影响功能是样本影响的统计度量，可以近似训练样品对深神经网络分类精度的影响。但是，这项先前的工作仅适用于培训和测试共享目标功能的监督学习。目前尚无用于估计无监督培训示例对深度学习模型的影响的方法。为了为无监督和半监督的培训制度带来解释性，我们得出了第一个理论和经验证明，即可以扩展影响功能以处理不匹配的训练和测试（即“交叉损失”）设置。我们的公式使我们能够在无监督的学习设置中计算影响，解释集群成员资格，并识别和增加语言模型中的偏见。我们的实验表明，相对于地面真相样本影响，我们的交叉影响估计值甚至超过了匹配的目标影响估计。

As machine learning is increasingly deployed in the real world, it is paramount that we develop the tools necessary to analyze the decision-making of the models we train and deploy to end-users. Recently, researchers have shown that influence functions, a statistical measure of sample impact, can approximate the effects of training samples on classification accuracy for deep neural networks. However, this prior work only applies to supervised learning, where training and testing share an objective function. No approaches currently exist for estimating the influence of unsupervised training examples for deep learning models. To bring explainability to unsupervised and semi-supervised training regimes, we derive the first theoretical and empirical demonstration that influence functions can be extended to handle mismatched training and testing (i.e., "cross-loss") settings. Our formulation enables us to compute the influence in an unsupervised learning setup, explain cluster memberships, and identify and augment biases in language models. Our experiments show that our cross-loss influence estimates even exceed matched-objective influence estimation relative to ground-truth sample impact.

下载PDF全文

下载文献需遵守相关版权规定

论文标题