We test the performance of deep deterministic policy gradient (DDPG), a deep reinforcement learning algorithm, able to handle continuous state and action spaces, to learn Nash equilibria in a setting where firms compete in prices. These algorithms are typically considered model-free because they do not require transition probability functions (as in e.g., Markov games) or predefined functional forms. Despite being model-free, a large set of parameters are utilized in various steps of the algorithm. These are e.g., learning rates, memory buffers, state-space dimensioning, normalizations, or noise decay rates and the purpose of this work is to systematically test the effect of these parameter configurations on convergence to the analytically derived Bertrand equilibrium. We find parameter choices that can reach convergence rates of up to 99%. The reliable convergence may make the method a useful tool to study strategic behavior of firms even in more complex settings. Keywords: Bertrand Equilibrium, Competition in Uniform Price Auctions, Deep Deterministic Policy Gradient Algorithm, Parameter Sensitivity Analysis
翻译:我们测试了深度确定性政策梯度(DPG)的性能,这是一种深层强化学习算法,能够处理连续的状态和行动空间,在公司进行价格竞争的环境中学习Nash 平衡。这些算法通常被视为没有模型,因为它们不需要过渡性概率功能(如Markov游戏)或预定义功能形式。尽管没有模型,但在算法的各个步骤中都使用了大量参数。这些参数包括学习率、记忆缓冲、州-空间维度、国家-空间维度、正常化或噪声衰减率,以及这项工作的目的是系统地测试这些参数配置对与分析得出的伯特兰平衡的趋同性效果。我们发现参数选择可以达到99%的趋同率。可靠的趋同性可能使该方法成为研究公司战略行为的有用工具,即使在较为复杂的环境下也是如此。关键词:Bertrand Equilibrium、统一价格紧缩中的竞争、深度确定性政策梯度梯度梯度梯度降、Alement Algorithm, Paramement Sentivesitutionalthal。