Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.
翻译:人类语言理解在颗粒度(如文字、短语和句子)的多个层次上运作,其抽象程度不断提高,可以按等级进行组合;然而,现有具有堆叠层的深层模型并不明确地模拟任何等级过程。本文建议采用基于不同CKY风格双树的循环变异模型,以效仿组成过程。我们把双向语言模型培训前目标扩展至这一结构,试图根据左边和右边的抽象节点预测每个词。为了扩大我们的方法,我们还采用了高效的修剪树诱导算法,以便能够在成型步骤的直线数中进行编码。语言建模的实验结果和不受监督的分解显示了我们的方法的有效性。