可解释的通过HAPEBOX搜索的可解释的非政策学习

论文标题

可解释的通过HAPEBOX搜索的可解释的非政策学习

Interpretable Off-Policy Learning via Hyperbox Search

论文作者

Tschernutter, Daniel, Hatt, Tobias, Feuerriegel, Stefan

论文摘要

个性化治疗决策已成为现代医学不可或缺的一部分。因此，目的是根据个人患者特征做出治疗决策。已经开发了许多方法来从观察数据中学习此类政策，这些数据在某个政策类别中获得最佳结果。然而，这些方法很少可以解释。但是，可解释性通常是临床实践中政策学习的先决条件。在本文中，我们提出了一种通过HARPOX搜索来解释的算法，用于可解释的非货币学习。特别是，我们的策略可以以分离的正常形式（即，或ands）表示，因此可以理解。我们证明了一个通用近似定理，该定理表明我们的策略类足够灵活，可以任意近似任何可测量的函数。为了优化，我们在分支结合的框架内开发了量身定制的列生成过程。使用仿真研究，我们证明了我们的算法在遗憾方面胜过可解释的非政策学习的最先进方法。我们使用现实的临床数据，与实际的临床专家进行用户研究，他们的政策将其评为高度易于解释。

Personalized treatment decisions have become an integral part of modern medicine. Thereby, the aim is to make treatment decisions based on individual patient characteristics. Numerous methods have been developed for learning such policies from observational data that achieve the best outcome across a certain policy class. Yet these methods are rarely interpretable. However, interpretability is often a prerequisite for policy learning in clinical practice. In this paper, we propose an algorithm for interpretable off-policy learning via hyperbox search. In particular, our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible. We prove a universal approximation theorem that shows that our policy class is flexible enough to approximate any measurable function arbitrarily well. For optimization, we develop a tailored column generation procedure within a branch-and-bound framework. Using a simulation study, we demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret. Using real-word clinical data, we perform a user study with actual clinical experts, who rate our policies as highly interpretable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题