A ubiquitous learning problem in today's digital market is, during repeated interactions between a seller and a buyer, how a seller can gradually learn optimal pricing decisions based on the buyer's past purchase responses. A fundamental challenge of learning in such a strategic setup is that the buyer will naturally have incentives to manipulate his responses in order to induce more favorable learning outcomes for him. To understand the limits of the seller's learning when facing such a strategic and possibly manipulative buyer, we study a natural yet powerful buyer manipulation strategy. That is, before the pricing game starts, the buyer simply commits to "imitate" a different value function by pretending to always react optimally according to this imitative value function. We fully characterize the optimal imitative value function that the buyer should imitate as well as the resultant seller revenue and buyer surplus under this optimal buyer manipulation. Our characterizations reveal many useful insights about what happens at equilibrium. For example, a seller with concave production cost will obtain essentially 0 revenue at equilibrium whereas the revenue for a seller with convex production cost is the Bregman divergence of her cost function between no production and certain production. Finally, and importantly, we show that a more powerful class of pricing schemes does not necessarily increase, in fact, may be harmful to, the seller's revenue. Our results not only lead to an effective prescriptive way for buyers to manipulate learning algorithms but also shed lights on the limits of what a seller can really achieve when pricing in the dark.
翻译:在当今数字市场中,一个普遍存在的学习问题就是,在卖方和买方之间反复互动时,卖方如何根据买方以往的购买反应逐步学习最佳定价决定的最佳定价决定。在这种战略设置中,一个基本的学习难题是,买方自然会有动力操纵其反应,以便为他带来更有利的学习结果。为了了解卖方在面对这样一个战略性和可能的操纵性买主时的学习限度,我们研究的是一种自然但又强大的黑暗的买方操纵策略。这就是,在定价游戏开始之前,买方只是承诺“缩小”一种不同的价值函数,假装总是根据买方以往的购买反应最佳。在这种模拟价值功能中,一个基本的学习挑战是,买方自然会有动力地操纵其反应的模拟价值功能,在这种最佳的买方操纵下,买方将自然地有动力地操纵其反应,从而产生更有利的学习结果。例如,一个具有定额生产成本的卖方将基本以平衡方式获得零收入,而一个具有定额生产成本的卖方的收入是其成本上的差异,而一个不同的价值,即假装地根据这种模拟价值总是以最优的方式根据这种模拟价值来作出反应。我们完全地描述买方在不生产与某种价格上的定价中,最后,我们也可以以更强烈的销售的结果。最后地表明,我们不会使卖方获得一个有效的销售的结果。最后和某些的销售的结果。最后,我们可能以更有害地显示一种结果。最后的结果是,我们不会使卖方的销售的结果。