高维度非静止动态定价：利用变点检测进行学习与获得 (High-Dimensional Dynamic Pricing under Non-Stationarity: Learning and Earning with Change-Point Detection)

We consider a high-dimensional dynamic pricing problem under non-stationarity, where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model with potential changes at unknown times. The demand model is assumed to be a high-dimensional generalized linear model (GLM), allowing for a feature vector in $\mathbb R^d$ that encodes products and consumer information. To achieve optimal revenue (i.e., least regret), the firm needs to learn and exploit the unknown GLMs while monitoring for potential change-points. To tackle such a problem, we first design a novel penalized likelihood-based online change-point detection algorithm for high-dimensional GLMs, which is the first algorithm in the change-point literature that achieves optimal minimax localization error rate for high-dimensional GLMs. A change-point detection assisted dynamic pricing (CPDP) policy is further proposed and achieves a near-optimal regret of order $O(s\sqrt{\Upsilon_T T}\log(Td))$, where $s$ is the sparsity level and $\Upsilon_T$ is the number of change-points. This regret is accompanied with a minimax lower bound, demonstrating the optimality of CPDP (up to logarithmic factors). In particular, the optimality with respect to $\Upsilon_T$ is seen for the first time in the dynamic pricing literature, and is achieved via a novel accelerated exploration mechanism. Extensive simulation experiments and a real data application on online lending illustrate the efficiency of the proposed policy and the importance and practical value of handling non-stationarity in dynamic pricing.

翻译：我们考虑一种高维度非静止动态定价问题，其中一家公司向 $T$ 个连续到达的消费者销售产品，这些消费者的行为符合一个未知需求模型，可能在未知时间发生变化。需求模型被假定为高维广义线性模型（GLM），允许具有在 $\mathbb R^d$ 中的特征向量，其中编码产品和消费者信息。要实现最优收益（即最小反悔），公司需要在监测潜在的变点的同时学习和利用未知的 GLMs。为了解决这个问题，我们首先设计了一种基于惩罚最大似然的在线变点检测算法，该算法是变点文献中针对高维 GLMs 实现局部化最小化误差率的第一个算法。进一步提出了一种变点检测辅助的动态定价（CPDP）策略，该策略实现了近似最优的反悔率，其顺序为 $O(s \sqrt{\Upsilon_T T}\log(Td))$，其中 $s$ 是稀疏级别，$\Upsilon_T$ 是变点数量。这个反悔率伴随着极小值下限，证明了 CPDP（对数因子）的最优性。特别是，在动态定价文献中，针对 $\Upsilon_T$ 的最优性第一次实现，并通过新颖的加速探索机制实现。广泛的模拟实验和在线借贷的实际应用说明了所提出的策略的有效性以及在动态定价中处理非静止性的重要性和实际价值。