Reinforcement learning holds tremendous promise in accelerator controls. The primary goal of this paper is to show how this approach can be utilised on an operational level on accelerator physics problems. Despite the success of model-free reinforcement learning in several domains, sample-efficiency still is a bottle-neck, which might be encompassed by model-based methods. We compare well-suited purely model-based to model-free reinforcement learning applied to the intensity optimisation on the FERMI FEL system. We find that the model-based approach demonstrates higher representational power and sample-efficiency, while the asymptotic performance of the model-free method is slightly superior. The model-based algorithm is implemented in a DYNA-style using an uncertainty aware model, and the model-free algorithm is based on tailored deep Q-learning. In both cases, the algorithms were implemented in a way, which presents increased noise robustness as omnipresent in accelerator control problems. Code is released in https://github.com/MathPhysSim/FERMI_RL_Paper.
翻译:强化学习在加速器控制方面有着巨大的希望。 本文的主要目的是展示如何在加速器物理问题的实际操作水平上使用这一方法。 尽管在一些领域成功开展了无模型强化学习,但样本效率仍然是一个瓶颈,可能包含在基于模型的方法中。 我们比较了适用于FERMI FEL系统的强度优化应用的纯模型型强化学习与无模型强化学习。我们发现,基于模型的方法显示了更高的代表性和样本效率,而无模型方法的无症状性能略高。基于模型的算法是在一种DYNA模式下使用一种了解不确定性的模式实施的,而无模型的算法则以有针对性的深层次学习为基础。在这两种情况下,算法的实施方式都显示在加速器控制问题中普遍存在的噪音增强强度。 代码发布于 https://github.com/Math-Phys/FERMI_RLper。