语言代表自监督学习的利特BERT (ALBERT: A Lite BERT for Self-supervised Learning of Language Representations) - 专知论文

会员服务 ·

0

语言表示 · MoDELS · BERT · Performer · Better ·

2019 年 9 月 26 日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

翻译：语言代表自监督学习的利特BERT

Zhenzhong Lan,Mingda Chen,Sebastian Goodman,Kevin Gimpel,Piyush Sharma,Radu Soricut

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.

翻译：在对自然语言表现进行培训之前,自然语言表现的模型规模增加,往往导致下游任务业绩的改善;然而,由于GPU/TPU记忆限制、培训时间延长和意外模型退化,在某些时候,由于GPU/TPU记忆力的限制、培训时间的延长和意外模型退化,进一步模型增长变得更加困难;为了解决这些问题,我们提出了两个降低参数的技术,以降低记忆消耗,提高BERT的培训速度。综合经验证据表明,我们提出的方法导致模型的规模比原始的BERT要大得多。我们还使用了一种自我监督的损失,重点是模拟刑罚一致性,并表明它一贯地帮助下游任务,提供多种感应力投入。结果,我们的最佳模型在GLUE、RACE和SQUAD基准上确立了新的最新结果,而与BERT大基准相比参数则更少。

5

相关内容

语言表示

语言表示一直是人工智能、计算语言学领域的研究热点。从早期的离散表示到最近的分散式表示，语言表示的主要研究内容包括如何针对不同的语言单位，设计表示语言的数据结构以及和语言的转换机制，即如何将语言转换成计算机内部的数据结构（理解）以及由计算机内部表示转换成语言（生成）。

【DeepMind深度学习课程】无监督表示学习前沿进展，129页ppt，Unsupervised Representation Learning

【DeepMind深度学习课程】无监督表示学习前沿进展，129页ppt，Unsupervised Representation Learning

专知会员服务

79+阅读 · 2020年6月29日

【Facebook AI】自监督学习在计算机视觉应用最新概述，108页ppt Self-supervised learning

【Facebook AI】自监督学习在计算机视觉应用最新概述，108页ppt Self-supervised learning

专知会员服务

164+阅读 · 2020年4月19日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

专知会员服务

24+阅读 · 2019年12月21日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

站在BERT肩膀上的NLP新秀们（PART III）

站在BERT肩膀上的NLP新秀们（PART III）

AINLP

11+阅读 · 2019年6月18日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Unsupervised Cross-lingual Representation Learning at Scale

Arxiv

5+阅读 · 2019年11月5日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

BERT for Joint Intent Classification and Slot Filling

Arxiv

13+阅读 · 2019年2月28日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

Deep contextualized word representations

Arxiv

10+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

相关VIP内容

【DeepMind深度学习课程】无监督表示学习前沿进展，129页ppt，Unsupervised Representation Learning

【DeepMind深度学习课程】无监督表示学习前沿进展，129页ppt，Unsupervised Representation Learning

专知会员服务

79+阅读 · 2020年6月29日

【Facebook AI】自监督学习在计算机视觉应用最新概述，108页ppt Self-supervised learning

【Facebook AI】自监督学习在计算机视觉应用最新概述，108页ppt Self-supervised learning

专知会员服务

164+阅读 · 2020年4月19日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

【Google论文强烈推荐】ALBERT:基于精简BERT的自我监督学习的语言表示，ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

专知会员服务

24+阅读 · 2019年12月21日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

站在BERT肩膀上的NLP新秀们（PART III）

站在BERT肩膀上的NLP新秀们（PART III）

AINLP

11+阅读 · 2019年6月18日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

相关论文

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Unsupervised Cross-lingual Representation Learning at Scale

Arxiv

5+阅读 · 2019年11月5日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

BERT for Joint Intent Classification and Slot Filling

Arxiv

13+阅读 · 2019年2月28日

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Arxiv

7+阅读 · 2019年2月3日

Deep contextualized word representations

Arxiv

10+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员