在无限的视野中，在离线增强学习方面具有统计有效的优势学习

论文标题

在无限的视野中，在离线增强学习方面具有统计有效的优势学习

Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

论文作者

Shi, Chengchun, Luo, Shikai, Le, Yuan, Zhu, Hongtu, Song, Rui

论文摘要

我们考虑在离线域中的强化学习（RL）方法，而没有其他在线数据收集，例如移动健康应用程序。计算机科学文献中的大多数现有策略优化算法都是在易于收集或模拟的在线设置中开发的。通过预采用的离线数据集，它们对移动健康应用程序的概括尚不清楚。本文的目的是开发一个新颖的优势学习框架，以便有效地使用预采用的数据进行策略优化。所提出的方法采用由任何现有的最新RL算法作为输入计算的最佳Q-估计器，并输出一个新策略，其价值保证的速度比基于初始Q-估计器得出的策略更快地收敛。进行了广泛的数值实验以支持我们的理论发现。我们提出的方法的Python实现可在https://github.com/leyuanheart/seal上获得。

We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remain unknown. The aim of this paper is to develop a novel advantage learning framework in order to efficiently use pre-collected data for policy optimization. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator. Extensive numerical experiments are conducted to back up our theoretical findings. A Python implementation of our proposed method is available at https://github.com/leyuanheart/SEAL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题