对强化学习中的非政策评估的评论

论文标题

对强化学习中的非政策评估的评论

A Review of Off-Policy Evaluation in Reinforcement Learning

论文作者

Uehara, Masatoshi, Shi, Chengchun, Kallus, Nathan

论文摘要

增强学习（RL）是机器学习中最活跃的研究前沿之一，最近已应用于解决许多具有挑战性的问题。在本文中，我们主要专注于非政策评估（OPE），这是RL中最基本的主题之一。近年来，统计和计算机科学文献中已经开发了许多OPE方法。我们提供了有关OPE效率界限的讨论，一些现有的最新OPE方法，其统计属性以及当前正在积极探索的其他相关研究方向。

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.

下载PDF全文

下载文献需遵守相关版权规定

论文标题