论文标题
解释经过原始层次多构造数据培训的分类器
Explaining Classifiers Trained on Raw Hierarchical Multiple-Instance Data
论文作者
论文摘要
从原始数据输入中学习,因此限制了对特征工程的需求,是机器学习方法在各个域中的许多成功应用的组成部分。虽然许多问题自然地转化为直接在标准分类器中使用的向量表示形式,但许多数据源具有结构化数据互换格式的自然形式(例如,以JSON/XML格式的安全日志)。现有方法,例如在层次多实例学习(HMIL)中,允许以原始形式从此类数据中学习。但是,对原始结构化数据训练的分类器的解释仍然在很大程度上尚未探索。通过将这些模型视为子集选择问题,我们证明了如何使用计算高效算法来生成具有优惠属性的可解释解释。我们与图形神经网络采用的一种解释技术进行比较,该技术显示了速度加速和更高质量的解释的顺序。
Learning from raw data input, thus limiting the need for feature engineering, is a component of many successful applications of machine learning methods in various domains. While many problems naturally translate into a vector representation directly usable in standard classifiers, a number of data sources have the natural form of structured data interchange formats (e.g., security logs in JSON/XML format). Existing methods, such as in Hierarchical Multiple Instance Learning (HMIL), allow learning from such data in their raw form. However, the explanation of the classifiers trained on raw structured data remains largely unexplored. By treating these models as sub-set selections problems, we demonstrate how interpretable explanations, with favourable properties, can be generated using computationally efficient algorithms. We compare to an explanation technique adopted from graph neural networks showing an order of magnitude speed-up and higher-quality explanations.