We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference-based argumentation with existing approaches to reinforcement learning from human feedback. Our method improves prior work by generalising human preferences, reducing the burden on the user and increasing the robustness of the reward model. We demonstrate this with a number of experiments.
翻译:我们定义了一个新型的神经-共振框架,即辩论性奖赏学习,它把基于优惠的争论与现有的从人类反馈中强化学习的方法结合起来。 我们的方法通过普及人类的偏好、减轻使用者的负担和增加奖赏模式的坚固性来改进先前的工作。 我们用一系列实验来证明这一点。