Most reinforcement learning (RL) recommendation systems designed for edge computing must either synchronize during recommendation selection or depend on an unprincipled patchwork collection of algorithms. In this work, we build on asynchronous coagent policy gradient algorithms \citep{kostas2020asynchronous} to propose a principled solution to this problem. The class of algorithms that we propose can be distributed over the internet and run asynchronously and in real-time. When a given edge fails to respond to a request for data with sufficient speed, this is not a problem; the algorithm is designed to function and learn in the edge setting, and network issues are part of this setting. The result is a principled, theoretically grounded RL algorithm designed to be distributed in and learn in this asynchronous environment. In this work, we describe this algorithm and a proposed class of architectures in detail, and demonstrate that they work well in practice in the asynchronous setting, even as the network quality degrades.
翻译:为边缘计算设计的多数强化学习(RL)建议系统要么在建议选择期间必须同步,要么依赖不固定的拼凑算法集。在这项工作中,我们以非同步的共试政策梯度运算法 \ citep{kostas2020asynchronous} 为基础,提出解决这一问题的原则性解决办法。我们提议的算法类别可以在互联网上分布,并且不同步地实时运行。当给定的边际无法对数据请求做出足够快速的反应时,这不是一个问题;算法旨在边缘环境中的功能和学习,而网络问题是这一设置的一部分。结果是一种原则性的、有理论基础的RL算法,旨在在这个不连续的环境中进行分配和学习。在这项工作中,我们详细描述这一算法和拟议的结构类别,并表明它们即使在网络质量退化时,在非同步环境中运作良好。