Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to specify the objective in a formal language and have it "compiled" to a reward scheme. Mungojerrie (https://plv.colorado.edu/mungojerrie/) is a tool for testing reward schemes for $\omega$-regular objectives on finite models. The tool contains reinforcement learning algorithms and a probabilistic model checker. Mungojerrie supports models specified in PRISM and $\omega$-automata specified in HOA.
翻译:未经系统事先了解的强化学习合成控制器。 在每个时间步骤, 给予奖励。 控制器优化了这些奖励的折扣总和。 应用这一类算法需要设计一种奖励计划, 通常是手工完成的。 设计者必须确保其意图得到准确的捕捉。 这也许不是微不足道的, 容易出错。 与直接组合中的编程相似的这一手工编程的替代办法是用一种正式语言指定目标, 并将其“ completed” 到奖励计划。 Mungojerrie( https://plv.coladado.edu/mungojerrie/) 是一个用于测试固定模型的美元- 常规目标的奖励计划的工具。 该工具包含强化学习算法和一个概率模型检查器。 Mungojerrie 支持在 PRISM 和 HOA中指定的 $\omega$- automata 中指定的模型。