Our work is motivated by a common business constraint in online markets. While firms respect the advantages of dynamic pricing and price experimentation, they must limit the number of price changes (i.e., switches) to be within some budget due to various practical reasons. We study both the classical price-based network revenue management problem in the distributionally-unknown setup, and the bandits with knapsacks problem. In these problems, a decision-maker (without prior knowledge of the environment) has finite initial inventory of multiple resources to allocate over a finite time horizon. Beyond the classical resource constraints, we introduce an additional switching constraint to these problems, which restricts the total number of times that the decision-maker makes switches between actions to be within a fixed switching budget. For such problems, we show matching upper and lower bounds on the optimal regret, and propose computationally-efficient limited-switch algorithms that achieve the optimal regret. Our work reveals a surprising result: the optimal regret rate is completely characterized by a piecewise-constant function of the switching budget, which further depends on the number of resource constraints -- to the best of our knowledge, this is the first time the number of resources constraints is shown to play a fundamental role in determining the statistical complexity of online learning problems. We conduct computational experiments to examine the performance of our algorithms on a numerical setup that is widely used in the literature. Compared with benchmark algorithms from the literature, our proposed algorithms achieve promising performance with clear advantages on the number of incurred switches. Practically, firms can benefit from our study and improve their learning and decision-making performance when they simultaneously face resource and switching constraints.
翻译:虽然公司尊重动态定价和价格实验的优势,但由于各种实际原因,它们必须限制价格变化(即开关)在某种预算范围内的数量。我们研究了基于价格的网络收入管理传统问题,在分配方面尚不为人知的设置中,以及使用背包的强盗问题。在这些问题上,决策者(没有事先对环境的了解)拥有有限的多种资源初步库存,以便在有限的时间范围内分配。除了传统的资源限制外,我们还要为这些问题引入额外的转换限制,这限制了决策者为在固定的转换预算范围内采取行动而作出转变的总次数。对于这些问题,我们既要显示基于价格的网络收入管理问题,又要表明在分配上下不为人所知的网络收入管理问题,又要提出实现最佳遗憾的计算效率有限的节率算。我们的工作揭示出一个令人惊讶的结果:最佳的遗憾率完全以转换预算的细微一致功能为特征,而这又取决于资源限制的数量 -- -- 至于我们的文献的最佳程度,这限制了决策者在固定的转换预算内,因此,我们首先要显示在最短的汇率上,我们使用的是用来判断我们所显示的统计的变数的计算结果的难度。