We introduce HUBERT which combines the structured-representational power of Tensor-Product Representations (TPRs) and BERT, a pre-trained bidirectional Transformer language model. We show that there is shared structure between different NLP datasets that HUBERT, but not BERT, is able to learn and leverage. We validate the effectiveness of our model on the GLUE benchmark and HANS dataset. Our experiment results show that untangling data-specific semantics from general language structure is key for better transfer among NLP tasks.
翻译:我们引入了HURBERT, 它将Tensor-Productions(TPRs)和BERT(BERT)的结构性代表力量结合起来, 这是一种预先培训的双向变换语言模型。 我们显示,在不同的NLP数据集之间有共同的结构, HURBERT(而不是BERT)能够学习和利用这些数据集。 我们验证了我们在GLUE基准和HANS数据集上的模型的有效性。 我们的实验结果表明,从通用语言结构中解开数据特定语义是更好地在NLP任务之间转移的关键。