论文标题
基于学习的单个服务器排队系统中的最佳入学控制
Learning-based Optimal Admission Control in a Single Server Queuing System
论文作者
论文摘要
我们考虑在M/M/1排队系统中最大化入院控制问题的长期平均利润,服务和到达率未知。在服务完成后收集了固定的奖励,并且在队列中等待的客户执行的每单位时间成本,调度员决定到达是否基于系统排队长度观察的完整观察史,是否承认到达客户。 (NAOR 1969,计量经济学)表明,如果已知模型的所有参数,那么使用静态阈值策略是最佳选择 - 承认排队长度是否小于预定的阈值,否则则不是。我们提出了一种基于学习的调度算法,并在Naor的完整信息模型(1969)的最佳调度策略方面遗憾。 We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{1+ε}(N))$ regret for any specified $ε>0$, in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of 到达。
We consider a long-term average profit maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue-length of the system. (Naor 1969, Econometrica) showed that if all the parameters of the model are known, then it is optimal to use a static threshold policy -- admit if the queue-length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full information model of Naor (1969). We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{1+ε}(N))$ regret for any specified $ε>0$, in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of arrivals.