We present a framework to integrate tensor network (TN) methods with reinforcement learning (RL) for solving dynamical optimisation tasks. We consider the RL actor-critic method, a model-free approach for solving RL problems, and introduce TNs as the approximators for its policy and value functions. Our "actor-critic with tensor networks" (ACTeN) method is especially well suited to problems with large and factorisable state and action spaces. As an illustration of the applicability of ACTeN we solve the exponentially hard task of sampling rare trajectories in two paradigmatic stochastic models, the East model of glasses and the asymmetric simple exclusion process (ASEP), the latter being particularly challenging to other methods due to the absence of detailed balance. With substantial potential for further integration with the vast array of existing RL methods, the approach introduced here is promising both for applications in physics and to multi-agent RL problems more generally.
翻译:我们提出了一个框架,将高压网络方法与强化学习(RL)相结合,以解决动态优化任务。我们认为RL的行为者-批评方法,是解决RL问题的无示范方法,并采用TN作为政策和价值功能的准比方。我们的“对高压网络(ACTeN)的驱动器-批评”方法特别适合大型和可因子化状态和行动空间的问题。为了说明ACTeN的适用性,我们解决了在两种范式随机模型中采样稀有轨迹的急剧艰巨任务,即东方眼镜模型和不对称的简单排除过程(SAPP),由于缺乏详细的平衡,后者对其他方法特别具有挑战性。由于与现有的大量RL方法进一步整合的巨大潜力,这里采用的方法在物理应用和更广泛的多试剂RL问题上都很有希望。