基于学习的单个服务器排队系统中的最佳入学控制

论文标题

基于学习的单个服务器排队系统中的最佳入学控制

Learning-based Optimal Admission Control in a Single Server Queuing System

论文作者

Cohen, Asaf, Subramanian, Vijay G., Zhang, Yili

论文摘要

我们考虑在M/M/1排队系统中最大化入院控制问题的长期平均利润，服务和到达率未知。在服务完成后收集了固定的奖励，并且在队列中等待的客户执行的每单位时间成本，调度员决定到达是否基于系统排队长度观察的完整观察史，是否承认到达客户。（NAOR 1969，计量经济学）表明，如果已知模型的所有参数，那么使用静态阈值策略是最佳选择 - 承认排队长度是否小于预定的阈值，否则则不是。我们提出了一种基于学习的调度算法，并在Naor的完整信息模型（1969）的最佳调度策略方面遗憾。 We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{1+ε}(N))$ regret for any specified $ε>0$, in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of 到达。

We consider a long-term average profit maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue-length of the system. (Naor 1969, Econometrica) showed that if all the parameters of the model are known, then it is optimal to use a static threshold policy -- admit if the queue-length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full information model of Naor (1969). We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{1+ε}(N))$ regret for any specified $ε>0$, in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of arrivals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题