Pricing decisions are increasingly made by AI. Thanks to their ability to train with live market data while making decisions on the fly, deep reinforcement learning algorithms are especially effective in taking such pricing decisions. In e-commerce scenarios, multiple reinforcement learning agents can set prices based on their competitor's prices. Therefore, research states that agents might end up in a state of collusion in the long run. To further analyze this issue, we build a scenario that is based on a modified version of a prisoner's dilemma where three agents play the game of rock paper scissors. Our results indicate that the action selection can be dissected into specific stages, establishing the possibility to develop collusion prevention systems that are able to recognize situations which might lead to a collusion between competitors. We furthermore provide evidence for a situation where agents are capable of performing a tacit cooperation strategy without being explicitly trained to do so.
翻译:定价决定越来越多地由AI做出。由于他们有能力在做决定时使用现场市场数据进行训练,深度强化学习算法在作出这种定价决定方面特别有效。在电子商务情况下,多个强化学习代理商可以根据竞争者的价格确定价格。因此,研究表明代理商可能最终处于长期串通状态。为了进一步分析这一问题,我们构建了一个基于囚犯困境的修改版本的设想,即三个代理商玩摇滚剪刀游戏。我们的结果表明,行动选择可以分解到具体阶段,建立发展串通预防系统的可能性,从而能够识别可能导致竞争者之间串通的情况。我们进一步提供证据,说明代理商能够在没有明确培训的情况下实施默认合作战略的情况。