经验MDP的贝叶斯正规化

论文标题

经验MDP的贝叶斯正规化

Bayesian regularization of empirical MDPs

论文作者

Gupta, Samarth, Hill, Daniel N., Ying, Lexing, Dhillon, Inderjit

论文摘要

在基于模型的马尔可夫决策过程的大多数应用中，通常从经验数据中估算出未知基础模型的参数。由于噪声，从估计模型中学到的政策通常远非基础模型的最佳政策。当应用于基础模型的环境时，学习的政策会导致次优性能，因此要求提供更好的概括性能的解决方案。在这项工作中，我们采用贝叶斯的观点，并通过先验信息将马尔可夫决策过程的目标函数正规化，以获得更强大的策略。提出了两种方法，一种基于$ l^1 $正则化，另一种基于相对熵正则化。我们评估了有关合成模拟和大规模在线购物商店的现实搜索日志的拟议算法。我们的结果表明，正规化MDP策略对模型中存在的噪声的鲁棒性。

In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题