Most state of the art decision systems based on Reinforcement Learning (RL) are data-driven black-box neural models, where it is often difficult to incorporate expert knowledge into the models or let experts review and validate the learned decision mechanisms. Knowledge-insertion and model review are important requirements in many applications involving human health and safety. One way to bridge the gap between data and knowledge driven systems is program synthesis: replacing a neural network that outputs decisions with a symbolic program generated by a neural network or by means of genetic programming. We propose a new programming language, BF++, designed specifically for automatic programming of agents in a Partially Observable Markov Decision Process (POMDP) setting and apply neural program synthesis to solve standard OpenAI Gym benchmarks.
翻译:以强化学习(RL)为基础的大多数先进决策系统都是数据驱动黑盒神经模型,往往难以将专家知识纳入模型,或让专家审查和验证所学的决策机制。知识插入和模式审查是涉及人类健康和安全的许多应用中的重要要求。缩小数据和知识驱动系统之间差距的一种方法是方案综合:用神经网络或通过基因方案制定的方式,取代以象征性程序产生决策的神经网络。我们提议了一种新的编程语言BF++,专门为在部分可观测的马尔科夫决策过程(POMDP)中自动编程的代理设计,并应用神经方案合成来解决OpenAI Gym 标准基准。