用于不平衡数据的可解释的ML

论文标题

用于不平衡数据的可解释的ML

Interpretable ML for Imbalanced Data

论文作者

Dablain, Damien A., Bellinger, Colin, Krawczyk, Bartosz, Aha, David W., Chawla, Nitesh V.

论文摘要

深度学习模型越来越多地应用于医学，自动驾驶和情报分析等高风险领域的不平衡数据。不平衡的数据加剧了深网的黑盒性质，因为类之间的关系可能高度偏斜和不清楚。这可以减少模型用户的信任，并妨碍不平衡学习算法的开发人员的进步。研究不平衡数据复杂性的现有方法旨在用于二进制分类，浅学习模型和低维数据。此外，当前可解释的人工智能（XAI）技术主要集中于将不透明的深度学习模型转换为更简单的模型（例如，决策树）或对特定实例进行映射预测输入，而不是检查全球数据属性和复杂性。因此，需要量身定制的框架，该框架结合了大型，高维，多级数据集，并发现在不平衡数据中常见的数据复杂性（例如类重叠，子概念和异差差异实例）中常见的数据复杂性。我们提出了一组技术，深度学习模型用户可以使用这些技术来识别，可视化和理解类原型，子概念和异常值的实例；并且通过不平衡的学习算法开发人员来检测模型性能关键的功能和类示例。我们的框架还确定了存在于阶级决策边界边界上的实例，这些实例可以携带高度歧视性信息。与许多现有的XAI技术将模拟决策映射到灰度像素位置不同，我们通过反向传播使用显着性来识别和汇总整个类别的图像色带。我们的框架可在\ url {https://github.com/dd1github/xai_for_imbalenced_learning}公开获得

Deep learning models are being increasingly applied to imbalanced data in high stakes fields such as medicine, autonomous driving, and intelligence analysis. Imbalanced data compounds the black-box nature of deep networks because the relationships between classes may be highly skewed and unclear. This can reduce trust by model users and hamper the progress of developers of imbalanced learning algorithms. Existing methods that investigate imbalanced data complexity are geared toward binary classification, shallow learning models and low dimensional data. In addition, current eXplainable Artificial Intelligence (XAI) techniques mainly focus on converting opaque deep learning models into simpler models (e.g., decision trees) or mapping predictions for specific instances to inputs, instead of examining global data properties and complexities. Therefore, there is a need for a framework that is tailored to modern deep networks, that incorporates large, high dimensional, multi-class datasets, and uncovers data complexities commonly found in imbalanced data (e.g., class overlap, sub-concepts, and outlier instances). We propose a set of techniques that can be used by both deep learning model users to identify, visualize and understand class prototypes, sub-concepts and outlier instances; and by imbalanced learning algorithm developers to detect features and class exemplars that are key to model performance. Our framework also identifies instances that reside on the border of class decision boundaries, which can carry highly discriminative information. Unlike many existing XAI techniques which map model decisions to gray-scale pixel locations, we use saliency through back-propagation to identify and aggregate image color bands across entire classes. Our framework is publicly available at \url{https://github.com/dd1github/XAI_for_Imbalanced_Learning}

下载PDF全文

下载文献需遵守相关版权规定

论文标题