Most state of the art decision systems based on Reinforcement Learning (RL) are data-driven black-box neural models, where it is often difficult to incorporate expert knowledge into the models or let experts review and validate the learned decision mechanisms. Knowledge-insertion and model review are important requirements in many applications involving human health and safety. One way to bridge the gap between data- and knowledge-driven systems is program synthesis: replacing a neural network that outputs decisions with one that generates decision-making code in some programming language. We propose a new programming language, BF++, designed specifically for neural program synthesis in a Partially Observable Markov Decision Process (POMDP) setting and generate programs for a number of standard OpenAI Gym benchmarks.
翻译:以强化学习(RL)为基础的大多数先进决策系统是数据驱动黑盒神经模型,往往难以将专家知识纳入模型,或让专家审查和验证所学的决策机制。知识插入和模式审查是涉及人类健康和安全的许多应用中的重要要求。缩小数据与知识驱动系统之间差距的一种方法是方案综合:用生成某些编程语言决策规则的神经网络取代输出决定的神经网络。我们提议了一种新的编程语言BF++,专门为神经程序合成设计,用于部分可观测的Markov决策程序(POMDP)设置和为若干标准OpenAI Gym基准制定方案。