因果关系和批处理学习：在未知领域的互补方法

论文标题

因果关系和批处理学习：在未知领域的互补方法

Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains

论文作者

Bannon, James, Windsor, Brad, Song, Wenbo, Li, Tao

论文摘要

强化学习算法在在线学习环境中取得了巨大的成功。但是，这些成功依赖于算法剂及其环境之间的低风险相互作用。在许多可能使用RL的情况下，例如医疗保健和自动驾驶，在早期培训期间大多数在线RL算法犯的错误都带来了不可接受的成本。这些设置需要开发强化学习算法，这些算法可以在所谓的批处理设置中运行，在该设置中，算法必须从某些（可能是未知）策略中生成的固定，有限和生成的数据集中学习。评估与收集数据的策略不同的策略称为非政策评估，并且自然提出了反事实问题。在这个项目中，我们展示了因果推断中的非政策评估和治疗效果的估计是解决同一问题的两种方法，并比较了这两个领域的最新进展。

Reinforcement learning algorithms have had tremendous successes in online learning settings. However, these successes have relied on low-stakes interactions between the algorithmic agent and its environment. In many settings where RL could be of use, such as health care and autonomous driving, the mistakes made by most online RL algorithms during early training come with unacceptable costs. These settings require developing reinforcement learning algorithms that can operate in the so-called batch setting, where the algorithms must learn from set of data that is fixed, finite, and generated from some (possibly unknown) policy. Evaluating policies different from the one that collected the data is called off-policy evaluation, and naturally poses counter-factual questions. In this project we show how off-policy evaluation and the estimation of treatment effects in causal inference are two approaches to the same problem, and compare recent progress in these two areas.

下载PDF全文

下载文献需遵守相关版权规定

论文标题