语言强化学习方法:POMDP/适应性控制方法 (Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach)

In this paper we address the solution of the popular Wordle puzzle, using new reinforcement learning methods, which apply more generally to adaptive control of dynamic systems and to classes of Partially Observable Markov Decision Process (POMDP) problems. These methods are based on approximation in value space and the rollout approach, admit a straightforward implementation, and provide improved performance over various heuristic approaches. For the Wordle puzzle, they yield on-line solution strategies that are very close to optimal at relatively modest computational cost. Our methods are viable for more complex versions of Wordle and related search problems, for which an optimal strategy would be impossible to compute. They are also applicable to a wide range of adaptive sequential decision problems that involve an unknown or frequently changing environment whose parameters are estimated on-line.

翻译：在本文中,我们探讨流行的Wordle拼图的解决方案,使用新的强化学习方法,这些方法更一般地适用于动态系统的适应控制和部分可观测的Markov决定程序(POMDP)问题,这些方法基于价值空间近似和推出方法,接受直接实施,并针对各种繁文缛节方法提供更好的绩效。对于Wordle拼图来说,它们产生的在线解决方案战略非常接近于最佳的计算成本相对较低。我们的方法对于更复杂的Wordle版本和相关搜索问题是可行的,对此,最佳的战略是无法计算的。这些方法也适用于一系列适应性顺序决定问题,这些问题涉及未知或经常变化的环境,其参数是在线估算的。