未来因特网应用多机构加强多机构加强学习:全面调查 (Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey)

Future Internet involves several emerging technologies such as 5G and beyond 5G networks, vehicular networks, unmanned aerial vehicle (UAV) networks, and Internet of Things (IoTs). Moreover, future Internet becomes heterogeneous and decentralized with a large number of involved network entities. Each entity may need to make its local decision to improve the network performance under dynamic and uncertain network environments. Standard learning algorithms such as single-agent Reinforcement Learning (RL) or Deep Reinforcement Learning (DRL) have been recently used to enable each network entity as an agent to learn an optimal decision-making policy adaptively through interacting with the unknown environments. However, such an algorithm fails to model the cooperations or competitions among network entities, and simply treats other entities as a part of the environment that may result in the non-stationarity issue. Multi-agent Reinforcement Learning (MARL) allows each network entity to learn its optimal policy by observing not only the environments, but also other entities' policies. As a result, MARL can significantly improve the learning efficiency of the network entities, and it has been recently used to solve various issues in the emerging networks. In this paper, we thus review the applications of MARL in the emerging networks. In particular, we provide a tutorial of MARL and a comprehensive survey of applications of MARL in next generation Internet. In particular, we first introduce single-agent RL and MARL. Then, we review a number of applications of MARL to solve emerging issues in future Internet. The issues consist of network access, transmit power control, computation offloading, content caching, packet routing, trajectory design for UAV-aided networks, and network security issues.

翻译：未来互联网涉及若干新兴技术,如5G网络和5G网络以外的技术、车辆网络、无人驾驶飞行器网络和Things(IoTs)互联网。此外,未来的互联网与大量参与的网络实体变得多样化和分散化。每个实体可能需要作出地方决定,在动态和不确定的网络环境中改善网络绩效。标准学习算法,如U-代理强化学习(RL)或深强化学习(DRL),最近被用来使每个网络实体能够通过与未知环境互动,适应性地学习最佳决策政策。然而,这样的算法无法模拟网络实体之间的合作或竞争,而只是将其他实体作为环境的一部分,可能导致非静止问题。多剂强化学习(MARL)使每个网络实体不仅通过观察环境,而且通过其他实体的政策学习最佳的政策。结果,MAR可以显著提高网络的学习效率,并且最近还被用来解决正在形成的网络中的各种问题。在本文中,我们提供了一个特定IML IML 的网络的升级。我们提供了一个新的IML IML 数据网络的升级。