价格保护保证下的学习和收入过渡阶段 (Phase Transitions in Learning and Earning under Price Protection Guarantee)

Motivated by the prevalence of ``price protection guarantee", which allows a customer who purchased a product in the past to receive a refund from the seller during the so-called price protection period (typically defined as a certain time window after the purchase date) in case the seller decides to lower the price, we study the impact of such policy on the design of online learning algorithm for data-driven dynamic pricing with initially unknown customer demand. We consider a setting where a firm sells a product over a horizon of $T$ time steps. For this setting, we characterize how the value of $M$, the length of price protection period, can affect the optimal regret of the learning process. We show that the optimal regret is $\tilde{\Theta}(\sqrt{T}+\min\{M,\,T^{2/3}\})$ by first establishing a fundamental impossible regime with novel regret lower bound instances. Then, we propose LEAP, a phased exploration type algorithm for \underline{L}earning and \underline{EA}rning under \underline{P}rice Protection to match this lower bound up to logarithmic factors or even doubly logarithmic factors (when there are only two prices available to the seller). Our results reveal the surprising phase transitions of the optimal regret with respect to $M$. Specifically, when $M$ is not too large, the optimal regret has no major difference when compared to that of the classic setting with no price protection guarantee. We also show that there exists an upper limit on how much the optimal regret can deteriorate when $M$ grows large. Finally, we conduct extensive numerical experiments to show the benefit of LEAP over other heuristic methods for this problem.

翻译：受“价格保护保证”的流行影响, 允许过去购买产品的客户在所谓的价格保护期(通常定义为购买日期之后的某个时间窗口)期间从卖方得到退款, 以防卖方决定降低价格, 我们研究这种政策对设计在线学习算法的影响, 用于数据驱动动态定价的在线学习算法, 最初未知的客户需求。我们考虑公司销售产品的时间范围为$T$ 。对于这一设定, 我们描述在所谓的价格保护期的固定价值, 即价格保护期的长度, 如何影响学习过程的最佳遗憾。我们还显示, 最坏的遗憾是$T$T$T$T$T$T$TQQ ⁇ min ⁇ M,\,\, T ⁇ 2/3 ⁇ 3 ⁇ $3$, 首先建立基本不可能的制度, 新的遗憾减少。然后, 我们提议LEAP, 一个分阶段的勘探型算法, 在下线{L}收益和下线{EEA} 内, 最佳保护制度下, 正在逐渐缩小 {P} 保护期, 来匹配这一低调的低调的差额。