重复价格竞争中的在线优化算法：均衡学习与算法合谋 (Online Optimization Algorithms in Repeated Price Competition: Equilibrium Learning and Algorithmic Collusion)

This paper examines whether widely used online learning algorithms in pricing can independently reach competitive outcomes or instead foster tacit collusion. This issue has drawn considerable attention from competition regulators as algorithmic pricing becomes more common in digital markets. Understanding when such algorithms lead to equilibrium prices or to supra-competitive prices is critical for buyers, sellers, and policymakers. We study the behavior of multi-armed bandit algorithms in repeated price competition. These algorithms only observe profits from the chosen prices, making them realistic models of automated pricing. Our formal analysis shows that an important class of online learning algorithms, called mean-based algorithms, reliably converges to Nash equilibrium in Bertrand competition. This finding is notable because, generally, online learning algorithms do not guarantee convergence. We also run extensive numerical experiments with different bandit algorithms, confirming that most widely used algorithms, including those not mean-based, converge to equilibrium. We observe supra-competitive prices only in specific cases where all sellers implement the same symmetric version of certain algorithms, such as UCB or Q-learning, and this effect diminishes as the number of competitors increases. Our results highlight that the risk of algorithmic collusion in competitive markets is often overstated. For most practical implementations of bandit algorithms, sellers' prices converge to competitive levels. Only under very specific and symmetric setups do prices remain above competitive benchmarks, and this effect diminishes with more competitors. These insights support regulators concerned with consumer welfare and managers considering algorithmic pricing tools. They suggest that while vigilance is warranted, fears of widespread algorithm-driven collusion may be exaggerated.

翻译：本文探讨了定价中广泛使用的在线学习算法是否能独立达到竞争性结果，抑或反而会促成隐性合谋。随着算法定价在数字市场日益普及，这一问题已引起竞争监管机构的高度关注。理解此类算法何时导致均衡价格或超竞争价格，对买方、卖方及政策制定者至关重要。我们研究了多臂赌博机算法在重复价格竞争中的行为。这些算法仅观测所选价格带来的利润，使其成为自动化定价的合理模型。我们的形式化分析表明，一类重要的在线学习算法（称为均值基础算法）在伯特兰竞争中可靠地收敛至纳什均衡。这一发现值得注意，因为通常在线学习算法并不保证收敛性。我们还对不同赌博机算法进行了大量数值实验，证实包括非均值基础算法在内的大多数常用算法均收敛至均衡。仅在特定情况下——即所有卖方实施相同对称版本的某些算法（如UCB或Q-learning）时——我们观察到超竞争价格，且该效应随竞争者数量增加而减弱。我们的研究结果强调，竞争市场中算法合谋的风险常被夸大。对于赌博机算法的大多数实际应用，卖方价格会收敛至竞争性水平。仅在非常特定且对称的设置下，价格才会维持在竞争基准之上，且该效应随竞争者增多而减弱。这些见解为关注消费者福利的监管者及考虑采用算法定价工具的管理者提供了支持。它们表明，尽管保持警惕是必要的，但对算法驱动的大范围合谋的担忧可能被过度渲染。