We consider a long-term average profit maximizing admission control problem in an M/M/1 queuing system with a known arrival rate but an unknown service rate. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue-length of the system. \cite[Econometrica]{Naor} showed that if all the parameters of the model are known, then it is optimal to use a static threshold policy - admit if the queue-length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full information model of \cite{Naor}. We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{3+\epsilon}(N))$ regret in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of arrivals and $\epsilon>0$.
翻译:在M/M/1排队制度中,我们考虑长期平均利润最大化入境控制问题,在已知抵达率为已知抵达率但服务率不明的排队制度中,我们考虑长期平均利润最大化的入场控制问题。在服务完成时收集固定的奖赏,对等候队列的客户按单位时间计费,调度员根据对抵达客户是否接收全程系统队列长度的观察历史来决定抵达目的地。\cite[Economica]{Naor}显示,如果该模型的所有参数都为人所知,那么最好采用静态的门槛政策——如果排队长度低于预定的门槛,则承认不为预定的。我们建议基于学习的发送算法,并遗憾地指出,如果对全程信息模式的最佳发送政策不是零美元,那么算法就会达到1美元,如果全部信息的最佳门槛是0美元(i.i.n.),那么最优政策就是拒绝所有到达点的美元。