Reinforcement Learning (RL) is a promising approach for solving various control, optimization, and sequential decision making tasks. However, designing reward functions for complex tasks (e.g., with multiple objectives and safety constraints) can be challenging for most users and usually requires multiple expensive trials (reward function hacking). In this paper we propose a specification language (Inkling Goal Specification) for complex control and optimization tasks, which is very close to natural language and allows a practitioner to focus on problem specification instead of reward function hacking. The core elements of our framework are: (i) mapping the high level language to a predicate temporal logic tailored to control and optimization tasks, (ii) a novel automaton-guided dense reward generation that can be used to drive RL algorithms, and (iii) a set of performance metrics to assess the behavior of the system. We include a set of experiments showing that the proposed method provides great ease of use to specify a wide range of real world tasks; and that the reward generated is able to drive the policy training to achieve the specified goal.
翻译:强化学习(RL)是解决各种控制、优化和顺序决策任务的一个很有希望的方法,然而,为复杂任务设计奖励功能(例如,具有多重目标和安全限制)对大多数用户来说可能具有挑战性,通常需要多重昂贵的试验(奖励功能黑客 ) 。在本文件中,我们提出了用于复杂控制和优化任务的规格语言(引入目标规格),非常接近自然语言,使执业者能够集中关注问题规格,而不是奖励功能黑客。我们框架的核心要素是:(一) 绘制高层次语言图,以适应控制和优化任务的上游时间逻辑;(二) 新型自动制导的密集奖励生成,可用于驱动RL算法,以及(三) 一套用于评估系统行为的性能衡量标准。我们包括一系列实验,表明拟议的方法非常容易使用,以具体说明广泛的真实世界任务;以及所产生的奖励能够推动政策培训,以实现规定的目标。