Here, we develop a deep learning algorithm for solving Principal-Agent (PA) mean field games with market-clearing conditions -- a class of problems that have thus far not been studied and one that poses difficulties for standard numerical methods. We use an actor-critic approach to optimization, where the agents form a Nash equilibria according to the principal's penalty function, and the principal evaluates the resulting equilibria. The inner problem's Nash equilibria is obtained using a variant of the deep backward stochastic differential equation (BSDE) method modified for McKean-Vlasov forward-backward SDEs that includes dependence on the distribution over both the forward and backward processes. The outer problem's loss is further approximated by a neural net by sampling over the space of penalty functions. We apply our approach to a stylized PA problem arising in Renewable Energy Certificate (REC) markets, where agents may rent clean energy production capacity, trade RECs, and expand their long-term capacity to navigate the market at maximum profit. Our numerical results illustrate the efficacy of the algorithm and lead to interesting insights into the nature of optimal PA interactions in the mean-field limit of these markets.
翻译:在这里,我们开发了一种深层次的学习算法,用市场清空条件解决首席官员(PA)意味着的实地游戏 -- -- 一种迄今为止尚未研究过的问题,并且给标准数字方法带来困难。我们使用一种行为者-批评的优化方法,即代理商根据主罚法功能形成纳什平衡,而本金则评估由此产生的平衡。内部问题的纳什平衡是利用为McKan-Vlasov前向后向的SDE(BSDE)而修改的深落后的随机差异方程(BSDE)方法的变种获得的,该方法包括依赖前向和后向的SDE的分布。我们用一个神经网对罚款功能空间进行取样,从而进一步近似于外部问题的损失。我们用我们的方法处理可再生能源证书(REC)市场中出现的典型的PA问题,在那里代理商可以租用清洁的能源生产能力、贸易REC,并扩大其长期能力,以最大的利润驾驭市场。我们的数字结果说明了算法的功效,并导致对最佳的PA在中位市场中进行最佳互动的性质进行有趣的洞察。