在州行动空间中使用无内存的随机策略来解决无限的Horizon Pomdps

论文标题

在州行动空间中使用无内存的随机策略来解决无限的Horizon Pomdps

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

论文作者

Müller, Johannes, Montúfar, Guido

论文摘要

完全可观察到的马尔可夫决策过程中的奖励优化等同于状态行动频率的线性程序。在具有无记忆随机策略的部分可观察到的马尔可夫决策过程中，采用类似的观点，该问题最近被提出为线性物镜受多项式约束的优化。基于此，我们提出了一种在州行动空间（ROSA）中进行奖励优化的方法。我们在迷宫导航任务中通过实验测试这种方法。我们发现Rosa在计算上是有效的，并且可以比其他现有方法产生稳定性的改进。

Reward optimization in fully observable Markov decision processes is equivalent to a linear program over the polytope of state-action frequencies. Taking a similar perspective in the case of partially observable Markov decision processes with memoryless stochastic policies, the problem was recently formulated as the optimization of a linear objective subject to polynomial constraints. Based on this we present an approach for Reward Optimization in State-Action space (ROSA). We test this approach experimentally in maze navigation tasks. We find that ROSA is computationally efficient and can yield stability improvements over other existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题