We introduce a neuro-symbolic natural logic framework based on reinforcement learning with introspective revision. The model samples and rewards specific reasoning paths through policy gradient, in which the introspective revision algorithm modifies intermediate symbolic reasoning steps to discover reward-earning operations as well as leverages external knowledge to alleviate spurious reasoning and training inefficiency. The framework is supported by properly designed local relation models to avoid input entangling, which helps ensure the interpretability of the proof paths. The proposed model has built-in interpretability and shows superior capability in monotonicity inference, systematic generalization, and interpretability, compared to previous models on the existing datasets.
翻译:我们引入了一个基于强化学习的自然逻辑框架,其基础是用反省法进行反省。模型样本和奖励通过政策梯度的具体推理路径,其中内省修正算法修改中间的象征性推理步骤,以发现有偿行动,并利用外部知识来减轻虚假推理和低效率培训。框架得到设计得当的当地关系模型的支持,以避免输入的串联,这有助于确保证据路径的可解释性。拟议模型具有内在的内在可解释性,并显示与现有数据集先前模型相比,在单音推理、系统化概括和可解释性方面有更高的能力。