我是在建造白盒代理还是解释黑匣子代理？

论文标题

我是在建造白盒代理还是解释黑匣子代理？

Am I Building a White Box Agent or Interpreting a Black Box Agent?

论文作者

Bewley, Tom

论文摘要

规则提取文献包含一个忠实 - 准确性难题的概念：当构建黑匣子功能的可解释模型时，优化忠诚度可能会降低基本任务上的绩效，反之亦然。我将这种困境与可解释的人工智能的现代领域的相关性重新确定，并突出显示黑匣子是与动态环境相互作用的代理商时如何使其复杂化。然后，我讨论了两个独立的研究方向 - 建造白盒代理和解释黑匣子代理 - 它们既连贯又值得关注，但不得通过研究人员在代理解释性领域中启动项目来混淆。

The rule extraction literature contains the notion of a fidelity-accuracy dilemma: when building an interpretable model of a black box function, optimising for fidelity is likely to reduce performance on the underlying task, and vice versa. I reassert the relevance of this dilemma for the modern field of explainable artificial intelligence, and highlight how it is compounded when the black box is an agent interacting with a dynamic environment. I then discuss two independent research directions - building white box agents and interpreting black box agents - which are both coherent and worthy of attention, but must not be conflated by researchers embarking on projects in the domain of agent interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题