Despite achieving superior performance in human-level control problems, unlike humans, deep reinforcement learning (DRL) lacks high-order intelligence (e.g., logic deduction and reuse), thus it behaves ineffectively than humans regarding learning and generalization in complex problems. Previous works attempt to directly synthesize a white-box logic program as the DRL policy, manifesting logic-driven behaviors. However, most synthesis methods are built on imperative or declarative programming, and each has a distinct limitation, respectively. The former ignores the cause-effect logic during synthesis, resulting in low generalizability across tasks. The latter is strictly proof-based, thus failing to synthesize programs with complex hierarchical logic. In this paper, we combine the above two paradigms together and propose a novel Generalizable Logic Synthesis (GALOIS) framework to synthesize hierarchical and strict cause-effect logic programs. GALOIS leverages the program sketch and defines a new sketch-based hybrid program language for guiding the synthesis. Based on that, GALOIS proposes a sketch-based program synthesis method to automatically generate white-box programs with generalizable and interpretable cause-effect logic. Extensive evaluations on various decision-making tasks with complex logic demonstrate the superiority of GALOIS over mainstream baselines regarding the asymptotic performance, generalizability, and great knowledge reusability across different environments.
翻译:尽管在人类层面的控制问题上取得了优异的绩效,但与人类不同,深强化学习(DRL)缺乏高端情报(例如逻辑推算和再利用),因此,在复杂的问题中,在学习和概括方面,它的行为不如人类,在学习和概括方面,它没有效力。以前的工作试图将白箱逻辑程序直接合成为DRL政策,显示由逻辑驱动的行为。然而,大多数综合方法都是建立在必要或宣示性的编程上,而且每个方法都有不同的局限性。前者忽视综合过程中的因果关系逻辑,导致各项任务之间的通用性低。后者严格以证据为基础,无法将方案与复杂的等级逻辑逻辑结合起来。在本文件中,我们将上述两种模式结合起来,并提出一个新的通用逻辑合成(GALOIS)框架,以综合等级和严格的因果效果逻辑方案。 GLOIS 利用方案草图和新的素描混合方案语言来指导综合。在此基础上,GLOIS提出了一种基于素描的合成方案合成方法,以自动生成带有可普遍和可解释的因果关系的白箱方案。我们综合的逻辑,提出了关于GALOIS的高度的逻辑性,并展示了各种的逻辑上优劣性,并展示了各种决策环境的逻辑,并重的逻辑上优于各种的逻辑,重新评价。