We consider a finite-horizon Mean Field Control problem for Markovian models. The objective function is composed of a sum of convex and Lipschitz functions taking their values on a space of state-action distributions. We introduce an iterative algorithm which we prove to be a Mirror Descent associated with a non-standard Bregman divergence, having a convergence rate of order 1/ $\sqrt$ K. It requires the solution of a simple dynamic programming problem at each iteration. We compare this algorithm with learning methods for Mean Field Games after providing a reformulation of our control problem as a game problem. These theoretical contributions are illustrated with numerical examples applied to a demand-side management problem for power systems aimed at controlling the average power consumption profile of a population of flexible devices contributing to the power system balance.
翻译:我们考虑对Markovian 模型的有限等离线平均场控问题。 目标功能由组合的 convex 函数和 Lipschitz 函数组成, 将其值套在州- 行动分布空间上。 我们引入了一个迭代算法, 这个算法与非标准的布雷格曼差异相关, 顺序为 1/ $\ sqrt$ K。 它需要在每个迭代中解决简单的动态编程问题。 我们把这个算法与普通场游戏的学习方法相比较, 然后再将我们的控制问题重新表述为游戏问题。 这些理论贡献用数字例子来说明电源系统的需求- 管理 问题, 目的是控制能促进电力系统平衡的灵活装置人群的平均电能耗情况 。