In feature-based dynamic pricing, a seller sets appropriate prices for a sequence of products (described by feature vectors) on the fly by learning from the binary outcomes of previous sales sessions ("Sold" if valuation $\geq$ price, and "Not Sold" otherwise). Existing works either assume noiseless linear valuation or precisely-known noise distribution, which limits the applicability of those algorithms in practice when these assumptions are hard to verify. In this work, we study two more agnostic models: (a) a "linear policy" problem where we aim at competing with the best linear pricing policy while making no assumptions on the data, and (b) a "linear noisy valuation" problem where the random valuation is linear plus an unknown and assumption-free noise. For the former model, we show a $\tilde{\Theta}(d^{\frac13}T^{\frac23})$ minimax regret up to logarithmic factors. For the latter model, we present an algorithm that achieves an $\tilde{O}(T^{\frac34})$ regret, and improve the best-known lower bound from $\Omega(T^{\frac35})$ to $\tilde{\Omega}(T^{\frac23})$. These results demonstrate that no-regret learning is possible for feature-based dynamic pricing under weak assumptions, but also reveal a disappointing fact that the seemingly richer pricing feedback is not significantly more useful than the bandit-feedback in regret reduction.
翻译:在基于地貌的动态定价中,卖方通过学习前几期销售会议的二进制结果(如果估值$\geq$,则“出售”“出售”,如果估值$\geq$,或“不出售”,则“不出售”),为连续系列产品(由特性矢量说明)设定适当的价格。在基于地貌的动态定价中,卖方要么假设无噪音线性估价,要么精确的噪音分布,这些假设很难核实时,这些算法的实际适用性受到限制。在这项工作中,我们研究另外两个不可知性模型:(a)“线性政策”问题,我们试图与最佳线性定价政策竞争,而没有对数据作出假设;(b)“线性激烈估值”问题,随机估值为线性,加上未知的和无假设的噪音。对于前一种模型,我们展示的是无噪音线线性线性线性估价值(dfrac13}T ⁇ frac23}美元,对于逻辑因素来说,我们提出的算算算算算得力(T\fregy-revyal remailate redustrual_maisal) rual deal degrefal degre 而不是Glexylexylexylexylexeflexeflex $。