Algorithmic pricing on online e-commerce platforms raises the concern of tacit collusion, where reinforcement learning algorithms learn to set collusive prices in a decentralized manner and through nothing more than profit feedback. This raises the question as to whether collusive pricing can be prevented through the design of suitable "buy boxes," i.e., through the design of the rules that govern the elements of e-commerce sites that promote particular products and prices to consumers. In previous work, Johnson et al. (2020) designed hand-crafted buy box rules that use demand-steering, based on the history of pricing by sellers, to prevent collusive behavior. Although effective against price collusion, these rules effect this by imposing severe restrictions on consumer choice and consumer welfare. In this paper, we demonstrate that reinforcement learning (RL) can also be used by platforms to learn buy box rules that are effective in preventing collusion by RL sellers, and to do so without reducing consumer choice. For this, we adopt the methodology of Stackelberg MDPs, and demonstrate success in learning robust rules that continue to provide high consumer welfare together with sellers employing different behavior models or having out-of-distribution costs for goods.
翻译:在网上电子商务平台上,在线电子商务平台的算法定价引起了暗中串通的担忧,因为强化学习算法学会以分散方式和仅通过利润反馈来设定串通价格,这就提出了这样一个问题,即是否可以通过设计合适的“买箱”来防止串通定价,即通过设计规范电子商务网站要素的规则,向消费者推销特定产品和价格。在以前的工作中,Johnson等人(202020年)设计了手工制作的买箱规则,根据销售商的定价历史,利用需求操纵来防止串通行为。这些规则虽然对价格串通有效,但通过对消费者选择和消费者福利施加严格的限制来影响。在本文中,我们表明强化学习平台也可以用来学习有助于防止风险交易销售商串通的框规则,从而有效防止风险销售商的串通,同时不减少消费者的选择。为此,我们采用了Stackelberg MDPs的方法, 并表明在学习强有力的规则方面取得成功,与采用不同行为模式或超额销售商一起提供高消费福利,同时提供高额销售。