Transformer architectures have achieved great success in solving natural language tasks, which learn strong language representations from large-scale unlabeled texts. In this paper, we seek to go further beyond and explore a new logical inductive bias for better language representation learning. Logic reasoning is known as a formal methodology to reach answers from given knowledge and facts. Inspired by such a view, we develop a novel neural architecture named FOLNet (First-Order Logic Network), to encode this new inductive bias. We construct a set of neural logic operators as learnable Horn clauses, which are further forward-chained into a fully differentiable neural architecture (FOLNet). Interestingly, we find that the self-attention module in transformers can be composed by two of our neural logic operators, which probably explains their strong reasoning performance. Our proposed FOLNet has the same input and output interfaces as other pretrained models and thus could be pretrained/finetuned by using similar losses. It also allows FOLNet to be used in a plug-and-play manner when replacing other pretrained models. With our logical inductive bias, the same set of ``logic deduction skills'' learned through pretraining are expected to be equally capable of solving diverse downstream tasks. For this reason, FOLNet learns language representations that have much stronger transfer capabilities. Experimental results on several language understanding tasks show that our pretrained FOLNet model outperforms the existing strong transformer-based approaches.
翻译:变换器架构在解决自然语言任务方面取得了巨大成功, 学习了大规模无标签文本的强烈语言表现。 在本文中, 我们试图超越并探索新的逻辑导导偏向, 以更好地进行语言代表学习。 逻辑推理被称为一种正式的方法, 以获得来自特定知识和事实的答案。 在这种观点的启发下, 我们开发了一个名为 FOLNet( FOLNet) 的新型神经结构( FOLNet) (FOL- Oder 逻辑网络), 以破解这种新的感化偏差。 我们建造了一套神经逻辑操作操作器, 作为可学习的 Horn 条款, 这些操作器被进一步提前连接到完全不同的神经结构( FOLNet 网络 ) 。 有趣的是, 我们发现变换器中的自我注意模块可以由两个神经逻辑逻辑逻辑逻辑逻辑逻辑逻辑逻辑操作者组成, 来解释其强的推理性性表现。 我们提议的 FOL Net 与其他预训练模型具有相同的输入和输出界面界面界面界面, 从而使用类似的模范模式进行预训练/ 。 它还允许 FOL 取代其他强的变压式变换模型, 我们的变压式变压式变压式变压式演法的演化模型具有相同的演法的演法的功能, 。