We study a repeated game between a supplier and a retailer who want to maximize their respective profits without full knowledge of the problem parameters. After characterizing the uniqueness of the Stackelberg equilibrium of the stage game with complete information, we show that even with partial knowledge of the joint distribution of demand and production costs, natural learning dynamics guarantee convergence of the joint strategy profile of supplier and retailer to the Stackelberg equilibrium of the stage game. We also prove finite-time bounds on the supplier's regret and asymptotic bounds on the retailer's regret, where the specific rates depend on the type of knowledge preliminarily available to the players. In the special case when the supplier is not strategic (vertical integration), we prove optimal finite-time regret bounds on the retailer's regret (or, equivalently, the social welfare) when costs and demand are adversarially generated and the demand is censored.
翻译:我们研究一个供应商和零售商之间的反复游戏,他们想在不完全了解问题参数的情况下最大限度地增加各自的利润。在用完整的信息来说明舞台游戏Stackelberg平衡的独特性之后,我们表明,即使部分了解需求和生产成本的共同分配,自然学习动态也保证供应商和零售商的联合战略配置与舞台游戏的Stackelberg平衡相融合。我们还证明,供应商的遗憾和对零售商的遗憾是有限的,对零售商的遗憾是无药可救,具体费率取决于参与者初步掌握的知识类型。 在供应商不具有战略意义(纵向一体化)的特殊情况下,当成本和需求是对抗性的,需求受到审查时,我们证明,在零售商的遗憾(或相当于社会福利)中,我们表现出了最佳的有限时间悔恨。