Using a model heat engine, we show that neural network-based reinforcement learning can identify thermodynamic trajectories of maximal efficiency. We consider both gradient and gradient-free reinforcement learning. We use an evolutionary learning algorithm to evolve a population of neural networks, subject to a directive to maximize the efficiency of a trajectory composed of a set of elementary thermodynamic processes; the resulting networks learn to carry out the maximally-efficient Carnot, Stirling, or Otto cycles. When given an additional irreversible process, this evolutionary scheme learns a previously unknown thermodynamic cycle. Gradient-based reinforcement learning is able to learn the Stirling cycle, whereas an evolutionary approach achieves the optimal Carnot cycle. Our results show how the reinforcement learning strategies developed for game playing can be applied to solve physical problems conditioned upon path-extensive order parameters.
翻译:使用模型热力引擎,我们显示神经网络强化学习可以识别最大效率的热力轨迹。我们考虑梯度和梯度均无强化学习。我们使用进化学习算法来发展神经网络群,但需遵守一项指令,以最大限度地提高由一套基本的热力过程组成的轨迹的效率;由此形成的网络学会执行效率最高、效率最高、节奏或奥托周期。如果有另一个不可逆转的过程,这个进化计划可以学习一个以前未知的热力循环。基于渐进式强化学习能够学习螺旋周期,而进化方法则能达到最佳的卡诺周期。我们的结果显示,为游戏制定的强化学习战略可以如何用于解决以路径感应参数为条件的物理问题。