Communicating useful background knowledge to reinforcement learning (RL) agents is an important and effective method for accelerating learning. We introduce RLang, a domain-specific language (DSL) for communicating domain knowledge to an RL agent. Unlike other existing DSLs proposed by the RL community that ground to single elements of a decision-making formalism (e.g., the reward function or policy function), RLang can specify information about every element of a Markov decision process. We define precise syntax and grounding semantics for RLang, and provide a parser implementation that grounds RLang programs to an algorithm-agnostic partial world model and policy that can be exploited by an RL agent. We provide a series of example RLang programs, and demonstrate how different RL methods can exploit the resulting knowledge, including model-free and model-based tabular algorithms, hierarchical approaches, and deep RL algorithms (including both policy gradient and value-based methods).
翻译:将有用的背景知识传播给强化学习(RL)代理商是一种重要而有效的加速学习方法。我们引入了RLang,这是一种特定域语言(DSL),用于向RL代理商传播域知识。与RL社区提议的其他现有DLSL不同,后者将单一决策形式要素(例如奖励功能或政策功能),RLang可以具体说明关于Markov决定过程每个要素的信息。我们为RLang定义了精确的语法和设定语义,并提供了一种派生实施,将RLang程序作为可被一个RL代理商利用的算法部分世界模式和政策的基础。我们提供一系列RLang程序实例,并展示不同的RL方法如何利用由此产生的知识,包括无模型和基于模型的表格算法、等级法和深RL算法(包括政策梯度和基于价值的方法)。