论文标题
教机器使用领域知识来解释自己
Teaching the Machine to Explain Itself using Domain Knowledge
论文作者
论文摘要
机器学习(ML)越来越多地用于帮助人类做出更好,更快的决策。但是,非技术人类在循环中努力理解模型预测背后的理由,阻碍了对算法决策系统的信任。关于AI解释性的大量研究工作通过开发解释方法来赢得对AI系统的信任,但仍然没有重大突破。同时,流行的解释方法(例如,石灰和外形)产生了对非DATA科学家角色很难理解的解释。为了解决这个问题,我们介绍了乔尔(Joel),这是一个基于神经网络的框架,可以共同学习决策任务以及传达领域知识的相关解释。乔尔(Joel)是针对缺乏深厚技术ML知识的人类领域专家量身定制的,对模型的预测提供了高级见解,这些预测非常类似于专家的推理。此外,我们从认证专家库中收集域反馈,并使用它来改善模型(人类教学),从而促进了无缝,更适合的解释。最后,我们求助于传统专家系统和域分类法之间的语义映射,以自动注释自举训练集,克服缺乏基于概念的人类注释。我们在现实世界欺诈检测数据集上经验验证乔尔。我们证明Joel可以从Bootstrap数据集中概括说明。此外,获得的结果表明,人类的教学可以进一步提高预测质量约13.57美元\%$。
Machine Learning (ML) has been increasingly used to aid humans to make better and faster decisions. However, non-technical humans-in-the-loop struggle to comprehend the rationale behind model predictions, hindering trust in algorithmic decision-making systems. Considerable research work on AI explainability attempts to win back trust in AI systems by developing explanation methods but there is still no major breakthrough. At the same time, popular explanation methods (e.g., LIME, and SHAP) produce explanations that are very hard to understand for non-data scientist persona. To address this, we present JOEL, a neural network-based framework to jointly learn a decision-making task and associated explanations that convey domain knowledge. JOEL is tailored to human-in-the-loop domain experts that lack deep technical ML knowledge, providing high-level insights about the model's predictions that very much resemble the experts' own reasoning. Moreover, we collect the domain feedback from a pool of certified experts and use it to ameliorate the model (human teaching), hence promoting seamless and better suited explanations. Lastly, we resort to semantic mappings between legacy expert systems and domain taxonomies to automatically annotate a bootstrap training set, overcoming the absence of concept-based human annotations. We validate JOEL empirically on a real-world fraud detection dataset. We show that JOEL can generalize the explanations from the bootstrap dataset. Furthermore, obtained results indicate that human teaching can further improve the explanations prediction quality by approximately $13.57\%$.