通过从自我游戏中深强化学习,提供不完善信息的复杂现实世界移动网络的简单到实际优化 (Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play)

Mobile network that millions of people use every day is one of the most complex systems in real world. Optimization of mobile network to meet exploding customer demand and reduce CAPEX/OPEX poses greater challenges than in prior works. Actually, learning to solve complex problems in real world to benefit everyone and make the world better has long been ultimate goal of AI. However, application of deep reinforcement learning (DRL) to complex problems in real world still remains unsolved, due to imperfect information, data scarcity and complex rules in real world, potential negative impact to real world, etc. To bridge this reality gap, we propose a sim-to-real framework to direct transfer learning from simulation to real world without any training in real world. First, we distill temporal-spatial relationships between cells and mobile users to scalable 3D image-like tensor to best characterize partially observed mobile network. Second, inspired by AlphaGo, we introduce a novel self-play mechanism to empower DRL agents to gradually improve intelligence by competing for best record on multiple tasks, just like athletes compete for world record in decathlon. Third, a decentralized DRL method is proposed to coordinate multi-agents to compete and cooperate as a team to maximize global reward and minimize potential negative impact. Using 7693 unseen test tasks over 160 unseen mobile networks in another simulator as well as 6 field trials on 4 commercial mobile networks in real world, we demonstrate the capability of this sim-to-real framework to direct transfer the learning not only from one simulator to another simulator, but also from simulation to real world. This is the first time that a DRL agent successfully transfers its learning directly from simulation to very complex real world problems with imperfect information, complex rules, huge state/action space, and multi-agent interactions.

翻译：数百万人每天使用的移动网络是现实世界中最复杂的系统之一。优化移动网络以满足客户需求的爆炸和减少CAPEX/OPEX 带来了比以往任何工作更大的挑战。事实上,学会解决现实世界中的复杂问题以造福于每个人并使世界更美好,长期以来一直是AI的最终目标。然而,对现实世界中的复杂问题应用深度强化学习(DRL)仍未得到解决,原因是信息不完善、数据稀缺和规则复杂,有可能对现实世界产生负面的模拟影响。为了弥合这一现实差距,我们提议了一个模拟到现实世界的模拟网络直接将学习从模拟转变为真实世界,而没有在现实世界中进行任何培训。首先,我们将细胞和移动用户之间的时间空间关系推向可变异的3D图像式阵列,将部分观测到流动网络。其次,在阿尔法戈的启发下,我们引入了一个全新的自我游戏机制,让DRL代理机构通过竞争最佳记录来逐步改善情报,但就像运动员在10世纪纪录中竞争一样, 第三,一个分散的DRRL任务将真实的移动网络转化为一个测试团队。