随机策略优化以最佳停止

论文标题

随机策略优化以最佳停止

Randomized Policy Optimization for Optimal Stopping

论文作者

Guan, Xinyi, Mišić, Velibor V.

论文摘要

最佳停止是确定何时停止随机系统以最大化奖励的问题，这在金融，运营管理和医疗保健等领域中至关重要。在实践中流行的高维最佳停止方法的现有方法产生确定性的线性策略 - 根据基础功能的加权总和的确定性停止的策略 - 但不能保证在固定基础功能架构中找到该策略类别中的最佳策略。在本文中，我们提出了一种基于随机线性策略的最佳停止方法的新方法，该方法选择以根据基础函数的加权总和确定的概率停止。我们通过确定在温和条件下，在固定基础功能体系结构中，我们激发了这些政策，对随机线性策略进行优化等同于优化确定性线性策略。我们从数据中学习随机线性策略的问题作为平滑的非凸样本平均近似（SAA）问题。从理论上讲，我们证明了我们的随机策略SAA问题几乎确定的收敛，并建立了基于Rademacher复杂性从我们的SAA问题获得的随机策略的样本外部性能。我们还表明，SAA问题是NP-HARD的一般，因此开发了一种实用的启发式方法来解决我们的随机政策问题。通过在选项定价问题实例的基准测试家族的数值实验中，我们表明我们的方法可以大大优于最先进的方法。

Optimal stopping is the problem of determining when to stop a stochastic system in order to maximize reward, which is of practical importance in domains such as finance, operations management and healthcare. Existing methods for high-dimensional optimal stopping that are popular in practice produce deterministic linear policies -- policies that deterministically stop based on the sign of a weighted sum of basis functions -- but are not guaranteed to find the optimal policy within this policy class given a fixed basis function architecture. In this paper, we propose a new methodology for optimal stopping based on randomized linear policies, which choose to stop with a probability that is determined by a weighted sum of basis functions. We motivate these policies by establishing that under mild conditions, given a fixed basis function architecture, optimizing over randomized linear policies is equivalent to optimizing over deterministic linear policies. We formulate the problem of learning randomized linear policies from data as a smooth non-convex sample average approximation (SAA) problem. We theoretically prove the almost sure convergence of our randomized policy SAA problem and establish bounds on the out-of-sample performance of randomized policies obtained from our SAA problem based on Rademacher complexity. We also show that the SAA problem is in general NP-Hard, and consequently develop a practical heuristic for solving our randomized policy problem. Through numerical experiments on a benchmark family of option pricing problem instances, we show that our approach can substantially outperform state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题