Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customer's valuation for the product is a linear function of contexts, including product and customer features, plus some random market noise. The seller does not observe the customer's true valuation, but instead needs to learn the valuation by leveraging contextual information and historical binary purchase feedbacks. Existing models typically assume full or partial knowledge of the random noise distribution. In this paper, we consider contextual dynamic pricing with unknown random noise in the valuation model. Our distribution-free pricing policy learns both the contextual function and the market noise simultaneously. A key ingredient of our method is a novel perturbed linear bandit framework, where a modified linear upper confidence bound algorithm is proposed to balance the exploration of market noise and the exploitation of the current knowledge for better pricing. We establish the regret upper bound and a matching lower bound of our policy in the perturbed linear bandit framework and prove a sub-linear regret bound in the considered pricing problem. Finally, we demonstrate the superior performance of our policy on simulations and a real-life auto-loan dataset.
翻译:环境动态定价旨在根据与客户的相继互动确定个人化价格。 在每个时间段, 有兴趣购买产品的客户都会来到平台。 客户对产品的估值是各种背景的线性功能, 包括产品和客户特点, 加上一些随机的市场噪音。 卖方没有观察客户的真正估值, 而是需要利用背景信息和历史双进购买反馈来学习估值。 现有模型通常对随机噪音分布拥有全部或部分知识。 在本文中, 我们考虑在估值模型中带有未知随机噪音的背景动态定价。 我们的无分配定价政策同时学习了上下文功能和市场噪音。 我们的方法的一个关键要素是一个新颖的周遭的线性线性线性线性条宽度框架, 其中提出修改的线性上信任约束算法, 以平衡市场噪音的探索和当前知识的利用, 以更好的定价。 我们确定了在周遭线性线性条框框框架中的遗憾上限和我们政策下限的下限, 并证明在考虑的价格问题中存在亚线性悔。 最后, 我们展示了我们模拟政策和实际生命数据集的优异性。