Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to specify the objective in a formal language and have it "compiled" to a reward scheme. Mungojerrie ($\href{https://plv.colorado.edu/mungojerrie/}{plv.colorado.edu/mungojerrie}$) is a tool for testing reward schemes for $\omega$-regular objectives on finite models. The tool contains reinforcement learning algorithms and a probabilistic model checker. Mungojerrie supports models specified in PRISM and $\omega$-automata specified in HOA.
翻译:未经系统事先了解的强化学习合成控制器。 每次时间步骤,都会给予奖励。 控制器优化这些奖励的折扣总和。 应用这一类算法需要设计一种奖励计划, 通常都是手工完成的。 设计者必须确保准确捕捉到其意图。 这也许不是微不足道的, 容易出错。 与直接组合中的编程相似的, 与该手动编程类似的一个替代方案是用一种正式语言指定目标, 并将其“ complete” 与奖励计划相匹配。 Mungojerrie $( https://plv. colcado.edu/mungojerrie/ ⁇ plv. chaladado. edu/mungojerrie}$是测试限定模型常规目标的奖励计划的工具。 该工具包含强化学习算法和一个概率模型检查器。 Mungojerie 支持在 PRISM 和 HOA中指定的 $\\\\ automata 中指定的模型。