Learning efficient and interpretable policies has been a challenging task in reinforcement learning (RL), particularly in the visual RL setting with complex scenes. While neural networks have achieved competitive performance, the resulting policies are often over-parameterized black boxes that are difficult to interpret and deploy efficiently. More recent symbolic RL frameworks have shown that high-level domain-specific programming logic can be designed to handle both policy learning and symbolic planning. However, these approaches rely on coded primitives with little feature learning, and when applied to high-dimensional visual scenes, they can suffer from scalability issues and perform poorly when images have complex object interactions. To address these challenges, we propose \textit{Differentiable Symbolic Expression Search} (DiffSES), a novel symbolic learning approach that discovers discrete symbolic policies using partially differentiable optimization. By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions, while also incorporating the strengths of neural networks for feature learning and optimization. Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more and scalable than state-of-the-art symbolic RL methods, with a reduced amount of symbolic prior knowledge.
翻译:在强化学习(RL)方面,特别是在具有复杂场景的视觉RL环境中,学习高效和可解释的政策是一项艰巨的任务。虽然神经网络已经取得了竞争性的绩效,但由此产生的政策往往是过于单数化的黑盒,难以有效解释和部署。最近更具象征意义的RL框架表明,高层次的域别特定编程逻辑可以设计用于处理政策学习和象征性规划。然而,这些方法依赖没有特征学习的编码原始程序,当应用到高维视觉场景时,它们可能受到可缩放问题的影响,当图像具有复杂的对象互动时,它们表现不善。我们提出,为应对这些挑战,我们提议\textit{可不同可视的符号表达搜索}(DiffSES),这是一个创新的象征性学习方法,利用部分不同的优化,发现离散的象征性政策。DiffSES通过使用对象级抽象数据而不是原始的像素级投入,能够利用符号表达的简单和可缩放的优势,同时将线性网络的长度纳入功能学习和优化。我们的实验表明,DiffSES(DIS)能够产生比先前的象征化程度更简单、更简单的、更简单、更小的象征性政策。