风险意识到自适应信念依赖性概率约束的连续POMDP计划

论文标题

风险意识到自适应信念依赖性概率约束的连续POMDP计划

Risk Aware Adaptive Belief-dependent Probabilistically Constrained Continuous POMDP Planning

论文作者

Zhitnikov, Andrey, Indelman, Vadim

论文摘要

尽管风险意识对在线运营代理人来说是基础，但在具有挑战性的连续领域和部分可观察性下，它受到了较少的关注。本文介绍了一种新颖的公式和解决方案，用于依赖风险的信念依赖性概率约束的连续POMDP。我们应对苛刻的信念依赖奖励和约束操作员的设置。概率置信参数使我们的配方真正规避风险，并且比最新的机会限制更灵活。我们严格的分析表明，在最僵硬的概率置信案例中，我们的配方非常接近机会限制。但是，我们的概率表达允许更快，更准确的适应性接受或修剪履行或违反约束的动作。此外，使用任意置信参数，我们没有找到任何对我们的方法的类似物。我们提出了在连续域中制定解决方案的算法。我们还使用重要性抽样来提升偶然受限的方法对连续环境。此外，我们所有提出的算法都可以与粒子代表的参数和非参数信念一起使用。最后但并非最不重要的一点是，我们贡献了严格的分析和模拟偶然约束连续POMDP的近似值。模拟表明，与基线相比，我们的算法表现出前所未有的腹部，在碰撞方面具有相同的性能。

Although risk awareness is fundamental to an online operating agent, it has received less attention in the challenging continuous domain and under partial observability. This paper presents a novel formulation and solution for risk-averse belief-dependent probabilistically constrained continuous POMDP. We tackle a demanding setting of belief-dependent reward and constraint operators. The probabilistic confidence parameter makes our formulation genuinely risk-averse and much more flexible than the state-of-the-art chance constraint. Our rigorous analysis shows that in the stiffest probabilistic confidence case, our formulation is very close to chance constraint. However, our probabilistic formulation allows much faster and more accurate adaptive acceptance or pruning of actions fulfilling or violating the constraint. In addition, with an arbitrary confidence parameter, we did not find any analogs to our approach. We present algorithms for the solution of our formulation in continuous domains. We also uplift the chance-constrained approach to continuous environments using importance sampling. Moreover, all our presented algorithms can be used with parametric and nonparametric beliefs represented by particles. Last but not least, we contribute, rigorously analyze and simulate an approximation of chance-constrained continuous POMDP. The simulations demonstrate that our algorithms exhibit unprecedented celerity compared to the baseline, with the same performance in terms of collisions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题