Most state of the art decision systems based on Reinforcement Learning (RL) are data-driven black-box neuralmodels, where it is often difficult to incorporate expert knowledge into the models or let experts review andvalidate the learned decision mechanisms. Knowledge-insertion and model review are important requirements inmany applications involving human health and safety. One way to bridge the gap between data and knowledgedriven systems is program synthesis: replacing a neural network that outputs decisions with one that generatesdecision-making code in some programming language. We propose a new programming language, BF++,designed specifically for neural program synthesis in a Partially Observable Markov Decision Process (POMDP)setting and generate programs for a number of standard OpenAI Gym benchmarks.
翻译:以强化学习(RL)为基础的大多数先进决策系统是数据驱动黑盒神经模型,往往难以将专家知识纳入模型,或让专家审查和验证所学的决策机制。知识插入和模型审查是涉及人类健康和安全的许多应用的重要要求。缩小数据和知识驱动系统之间差距的一个办法是方案综合:用产生某些编程语言决策守则的神经网络取代输出决定的神经网络。我们提议了一种新的编程语言BF++,专门为神经方案合成设计,用于部分可观测的Markov决策程序(POMDP)的制定,并为若干标准OpenAI Gym基准制定方案。