贝尔曼遇到霍克斯：基于模型的加固学习通过时间点过程

论文标题

贝尔曼遇到霍克斯：基于模型的加固学习通过时间点过程

Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

论文作者

Qu, Chao, Tan, Xiaoyu, Xue, Siqiao, Shi, Xiaoming, Zhang, James, Mei, Hongyuan

论文摘要

我们考虑一个顺序决策问题，代理人面临着以随机离散事件为特征的环境，并寻求最佳的干预政策，从而最大程度地提高了其长期奖励。这个问题在社交媒体，金融和健康信息学中无处不在，但很少受到强化学习的研究。为此，我们提出了一个基于模型的增强学习的新颖框架，在该学习中，代理的动作和观察是连续时间发生异步的随机离散事件。我们通过霍克斯工艺以外部干预控制项对环境的动力学进行建模，并开发算法将这种过程嵌入钟形方程中，以指导价值梯度的方向。我们在合成模拟器和现实世界中都证明了我们方法的优势。

We consider a sequential decision making problem where the agent faces the environment characterized by the stochastic discrete events and seeks an optimal intervention policy such that its long-term reward is maximized. This problem exists ubiquitously in social media, finance and health informatics but is rarely investigated by the conventional research in reinforcement learning. To this end, we present a novel framework of the model-based reinforcement learning where the agent's actions and observations are asynchronous stochastic discrete events occurring in continuous-time. We model the dynamics of the environment by Hawkes process with external intervention control term and develop an algorithm to embed such process in the Bellman equation which guides the direction of the value gradient. We demonstrate the superiority of our method in both synthetic simulator and real-world problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题