Lattice-BERT:在中文预培训语言模式中利用多语种代表 (Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · INFORMS · CLUE · state-of-the-art ·

2021 年 5 月 28 日

Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models

翻译：Lattice-BERT:在中文预培训语言模式中利用多语种代表

Yuxuan Lai,Yijia Liu,Yansong Feng,Songfang Huang,Dongyan Zhao

from arxiv, Accepted at NAACL 2021, 16 pages, https://github.com/alibaba/AliceMind/tree/main/LatticeBERT

Chinese pre-trained language models usually process text as a sequence of characters, while ignoring more coarse granularity, e.g., words. In this work, we propose a novel pre-training paradigm for Chinese -- Lattice-BERT, which explicitly incorporates word representations along with characters, thus can model a sentence in a multi-granularity manner. Specifically, we construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers. We design a lattice position attention mechanism to exploit the lattice structures in self-attention layers. We further propose a masked segment prediction task to push the model to learn from rich but redundant information inherent in lattices, while avoiding learning unexpected tricks. Experiments on 11 Chinese natural language understanding tasks show that our model can bring an average increase of 1.5% under the 12-layer setting, which achieves new state-of-the-art among base-size models on the CLUE benchmarks. Further analysis shows that Lattice-BERT can harness the lattice structures, and the improvement comes from the exploration of redundant information and multi-granularity representations. Our code will be available at https://github.com/alibaba/pretrained-language-models/LatticeBERT.

翻译：在这项工作中,我们提出了一个新的中国人培训前模式 -- -- Lattice-BERT, 明确将文字表达与字符结合起来,从而能够以多语种的方式模拟句子。具体地说,我们从一个句子中的字符和词组中构建了一个拉特斯图,并将所有这些文本单元中的所有文本单元输入变异器。我们设计了一个拉特斯位置注意机制,在自我注意层中利用拉特斯结构。我们进一步提议了一个蒙面部分预测任务,以推动该模式,从拉特克所固有的丰富但多余的信息中学习,同时避免学习出乎意料的把戏。关于11个中国自然语言理解任务的实验表明,我们的模型可以在12层设置下带来1.5%的平均增长幅度,从而在CLUE基准中实现新的基本规模模型中的最新状态。我们进一步的分析显示,Lattice-BERT可以利用拉特斯结构,而改进则来自对冗余信息探索和多语言模式/多语言模式的改进。

0

相关内容

语言模型化

语言模型化

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

专知会员服务

25+阅读 · 2019年12月26日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

专知会员服务

100+阅读 · 2019年11月15日

【AAAI2020接受论文】隐式关系语言模型，CMU&微软，Latent Relation Language Models

【AAAI2020接受论文】隐式关系语言模型，CMU&微软，Latent Relation Language Models

专知会员服务

54+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

NLP预训练模型大集合！

NLP预训练模型大集合！

全球人工智能

31+阅读 · 2018年12月29日

NLP预训练模型大集合

NLP预训练模型大集合

机器学习算法与Python学习

8+阅读 · 2018年12月28日

基于Lattice LSTM的命名实体识别

基于Lattice LSTM的命名实体识别

微信AI

47+阅读 · 2018年10月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Multi-Task Learning based Online Dialogic Instruction Detection with Pre-trained Language Models

Arxiv

1+阅读 · 2021年7月15日

TreeBERT: A Tree-Based Pre-Trained Model for Programming Language

Arxiv

0+阅读 · 2021年7月15日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Simplify the Usage of Lexicon in Chinese NER

Arxiv

5+阅读 · 2020年10月14日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Arxiv

4+阅读 · 2019年9月11日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

Chinese NER Using Lattice LSTM

Arxiv

14+阅读 · 2018年5月15日

VIP会员

文章信息

相关主题

语言模型化

state-of-the-art

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

专知会员服务

25+阅读 · 2019年12月26日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

专知会员服务

100+阅读 · 2019年11月15日

【AAAI2020接受论文】隐式关系语言模型，CMU&微软，Latent Relation Language Models

【AAAI2020接受论文】隐式关系语言模型，CMU&微软，Latent Relation Language Models

专知会员服务

54+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

7500字 | 《实现压倒性优势：美陆军人工智能未来图景》附原文

中文版2500字 | 美陆军“项目融合”计划：推动大规模作战行动中的目标定位革新（附原文）

4300字《创新防御：生成式人工智能在美国军事演进中的角色》附原文

《重新思考军事战略：战斗消耗中暴露度研究》258页博士论文

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

NLP预训练模型大集合！

NLP预训练模型大集合！

全球人工智能

31+阅读 · 2018年12月29日

NLP预训练模型大集合

NLP预训练模型大集合

机器学习算法与Python学习

8+阅读 · 2018年12月28日

基于Lattice LSTM的命名实体识别

基于Lattice LSTM的命名实体识别

微信AI

47+阅读 · 2018年10月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Multi-Task Learning based Online Dialogic Instruction Detection with Pre-trained Language Models

Arxiv

1+阅读 · 2021年7月15日

TreeBERT: A Tree-Based Pre-Trained Model for Programming Language

Arxiv

0+阅读 · 2021年7月15日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Simplify the Usage of Lexicon in Chinese NER

Arxiv

5+阅读 · 2020年10月14日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

11+阅读 · 2019年10月30日

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Arxiv

4+阅读 · 2019年9月11日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

Chinese NER Using Lattice LSTM

Arxiv

14+阅读 · 2018年5月15日

微信扫码咨询专知VIP会员