通过离线深层增强学习的时空激励措施优化乘车服务

论文标题

通过离线深层增强学习的时空激励措施优化乘车服务

Spatio-temporal Incentives Optimization for Ride-hailing Services with Offline Deep Reinforcement Learning

论文作者

Wu, Yanqiu, Li, Qingyang, Qin, Zhiwei

论文摘要

任何对等乘车共享系统中的一个基本问题是如何有效，有效地满足乘客实时平衡供求的要求。在乘客方面，传统方法通过增加用户呼吁调整需求分布的可能性来关注定价策略。但是，以前的方法没有考虑到战略变化对未来供求变化的影响，这意味着由于乘客的电话，驾驶员将重新定位到不同目的地，这将影响驾驶员将来一段时间内的收入。在这一观察结果的推动下，我们试图通过学习长期时空价值作为定价策略的指南来优化解决此问题的需求分布。在这项研究中，我们提出了一种基于深层增强学习的方法，重点关注需求方面，以提高运输资源和客户满意度的利用。我们采用时空学习方法来学习不同时间和位置的价值，然后激励乘客的乘车请求，以调整需求分布以平衡系统中的供应和需求。特别是，我们将问题建模为马尔可夫决策过程（MDP）。

A fundamental question in any peer-to-peer ride-sharing system is how to, both effectively and efficiently, meet the request of passengers to balance the supply and demand in real time. On the passenger side, traditional approaches focus on pricing strategies by increasing the probability of users' call to adjust the distribution of demand. However, previous methods do not take into account the impact of changes in strategy on future supply and demand changes, which means drivers are repositioned to different destinations due to passengers' calls, which will affect the driver's income for a period of time in the future. Motivated by this observation, we make an attempt to optimize the distribution of demand to handle this problem by learning the long-term spatio-temporal values as a guideline for pricing strategy. In this study, we propose an offline deep reinforcement learning based method focusing on the demand side to improve the utilization of transportation resources and customer satisfaction. We adopt a spatio-temporal learning method to learn the value of different time and location, then incentivize the ride requests of passengers to adjust the distribution of demand to balance the supply and demand in the system. In particular, we model the problem as a Markov Decision Process (MDP).

下载PDF全文

下载文献需遵守相关版权规定

论文标题