需求学习中的自我适应强力 (Self-adapting Robustness in Demand Learning)

We study dynamic pricing over a finite number of periods in the presence of demand model ambiguity. Departing from the typical no-regret learning environment, where price changes are allowed at any time, pricing decisions are made at pre-specified points in time and each price can be applied to a large number of arrivals. In this environment, which arises in retailing, a pricing decision based on an incorrect demand model can significantly impact cumulative revenue. We develop an adaptively-robust-learning (ARL) pricing policy that learns the true model parameters from the data while actively managing demand model ambiguity. It optimizes an objective that is robust with respect to a self-adapting set of demand models, where a given model is included in this set only if the sales data revealed from prior pricing decisions makes it "probable". As a result, it gracefully transitions from being robust when demand model ambiguity is high to minimizing regret when this ambiguity diminishes upon receiving more data. We characterize the stochastic behavior of ARL's self-adapting ambiguity sets and derive a regret bound that highlights the link between the scale of revenue loss and the customer arrival pattern. We also show that ARL, by being conscious of both model ambiguity and revenue, bridges the gap between a distributionally robust policy and a follow-the-leader policy, which focus on model ambiguity and revenue, respectively. We numerically find that the ARL policy, or its extension thereof, exhibits superior performance compared to distributionally robust, follow-the-leader, and upper-confidence-bound policies in terms of expected revenue and/or value at risk.

翻译：在需求模式模糊不清的情况下,我们研究在一定的时期内的动态定价。我们从典型的允许价格变化的不回报学习环境出发,在预先确定的时间点作出定价决定,每个价格可以适用于大量抵达者。在这种环境下,在零售过程中,基于不正确的需求模式的定价决定可以极大地影响累积收入。我们制定了适应性-扭曲性学习定价政策,从数据中学习真正的模型参数,同时积极管理需求模式的模糊性。它优化了在自适应性需求模式模式方面强有力的目标,在自适应性需求模式中,在自适应性需求模式中包含一个特定模式,只有在从先前的定价决定中披露的销售数据“有可能”对大量抵达者适用。因此,在需求模式模糊性很高的情况下,从强性转变,在获得更多数据后将遗憾降到最小。我们将ARL的自适应性不确定性组合和自适应性不确定性的模棱两面行为,在收入和客户到货前政策上,我们用一个有意识的模棱两面的政策重点,我们用一个有弹性的政策重点,在风险的标本政策上,在税收政策上,在高端的汇率上。