We study a dynamic pricing and capacity sizing problem in a $GI/GI/1$ queue, where the service provider's objective is to obtain the optimal service fee $p$ and service capacity $\mu$ so as to maximize the cumulative expected profit (the service revenue minus the staffing cost and delay penalty). Due to the complex nature of the queueing dynamics, such a problem has no analytic solution so that previous research often resorts to heavy-traffic analysis where both the arrival rate and service rate are sent to infinity. In this work we propose an online learning framework designed for solving this problem which does not require the system's scale to increase. Our framework is dubbed Gradient-based Online Learning in Queue (GOLiQ). GOLiQ organizes the time horizon into successive operational cycles and prescribes an efficient procedure to obtain improved pricing and staffing policies in each cycle using data collected in previous cycles. Data here include the number of customer arrivals, waiting times, and the server's busy times. The ingenuity of this approach lies in its online nature, which allows the service provider do better by interacting with the environment. Effectiveness of GOLiQ is substantiated by (i) theoretical results including the algorithm convergence and regret analysis (with a logarithmic regret bound), and (ii) engineering confirmation via simulation experiments of a variety of representative $GI/GI/1$ queues.
翻译:我们研究的是GI/GI/1美元队列中的动态定价和能力化问题,在这个队列中,服务供应商的目标是获得最佳服务费(p美元)和服务能力(mu美元),以最大限度地实现预期的累计利润(服务收入减去人事费和延迟罚款)。由于排队动态的复杂性,这样一个问题没有分析性的解决办法,因此以前的研究往往采用重型交易分析,即抵达率和服务率被送至无限。在这项工作中,我们提出一个旨在解决这一问题的在线学习框架,不需要系统规模扩大。我们的框架被称为“基于渐进的在线学习”(GOLiQ);GOLiQ将时间跨度组织成连续的运作周期,并规定了一个高效的程序,利用前几个周期收集的数据改进每个周期的定价和人员配置政策。这里的数据包括客户抵达人数、等候时间和服务器繁忙时间。这一方法的巧妙在于其在线性质,它使服务供应商能够更好地在Queee(GOLQ)进行基于进步的在线在线在线学习(基于渐进的在线学习)在线学习(GOLIQ)的升级分析,其中包括通过模拟环境的升级结果(GILI/RRI)的可靠性分析。