RobBERTje:荷兰BERT模型 (RobBERTje: a Distilled Dutch BERT Model) - 专知论文

会员服务 ·

0

蒸馏 · Performer · MoDELS · 语言模型化 · 精调 ·

2022 年 4 月 28 日

RobBERTje: a Distilled Dutch BERT Model

翻译：RobBERTje:荷兰BERT模型

Pieter Delobelle,Thomas Winters,Bettina Berendt

from arxiv, Published in CLIN journal

Pre-trained large-scale language models such as BERT have gained a lot of attention thanks to their outstanding performance on a wide range of natural language tasks. However, due to their large number of parameters, they are resource-intensive both to deploy and to fine-tune. Researchers have created several methods for distilling language models into smaller ones to increase efficiency, with a small performance trade-off. In this paper, we create several different distilled versions of the state-of-the-art Dutch RobBERT model and call them RobBERTje. The distillations differ in their distillation corpus, namely whether or not they are shuffled and whether they are merged with subsequent sentences. We found that the performance of the models using the shuffled versus non-shuffled datasets is similar for most tasks and that randomly merging subsequent sentences in a corpus creates models that train faster and perform better on tasks with long sequences. Upon comparing distillation architectures, we found that the larger DistilBERT architecture worked significantly better than the Bort hyperparametrization. Interestingly, we also found that the distilled models exhibit less gender-stereotypical bias than its teacher model. Since smaller architectures decrease the time to fine-tune, these models allow for more efficient training and more lightweight deployment of many Dutch downstream language tasks.

翻译：诸如BERT等经过预先培训的大型语言模型因其在各种自然语言任务方面的杰出表现而得到很多关注。然而,由于参数众多,这些模型在部署和微调上都是资源密集型的。研究人员发明了几种方法,将语言模型蒸馏成较小的模型,以提高效率,同时进行小规模的绩效权衡。在本文中,我们创建了几种先进的荷兰先进 RobBERT 模型的不同版本,并将其称为 RobBERTje。它们的蒸馏材料在蒸馏材料中存在不同之处,即它们是否被抖动,是否与随后的句子合并。我们发现,在多数任务中,使用摇晃动式和非压动式数据集的模型的性能相似,而随后在材料中随机合并的句子可以创造出一些模型,在长序列的任务上,培训速度更快,工作表现得更好。在比较蒸馏结构时,我们发现更大的DettriBERT结构比博特超平衡化语言结构效果要好得多。有趣的是,我们还发现它们是否与随后的句子合并在一起。我们发现,在多数任务中,使用摇晃动式的模型比教学结构更容易地展示了这些模型,因此使得许多的部署模式的压式结构比改进了。

0

相关内容

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

lnc-CENPQ-2在颞叶内侧型癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

循环干扰信道的容量和高效编码传输方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

南疆干旱区土壤盐渍化的高光谱响应与定量反演模型构建研究

国家自然科学基金

0+阅读 · 2012年12月31日

60GHz及Q波段CMOS功率放大器增益增强与片上功率合成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

非高斯环境噪声下的高效自适应无线传感器网络定位算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

认知无线电系统可调多波段多模式高效射频功率放大器研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Rayleigh信道统计分析和建模

国家自然科学基金

0+阅读 · 2009年12月31日

高维问题和稳健性研究

国家自然科学基金

0+阅读 · 2009年12月31日

P450亚型酶对土壤典型污染物毒性响应及致毒作用机理

国家自然科学基金

0+阅读 · 2009年12月31日

HICEM: A High-Coverage Emotion Model for Artificial Emotional Intelligence

Arxiv

0+阅读 · 2022年6月15日

HyperPrompt: Prompt-based Task-Conditioning of Transformers

Arxiv

0+阅读 · 2022年6月15日

Automated Detection of Typed Links in Issue Trackers

Arxiv

0+阅读 · 2022年6月14日

Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure

Arxiv

0+阅读 · 2022年6月13日

Sequential asset ranking in nonstationary time series

Arxiv

0+阅读 · 2022年6月11日

Mixed Logit Models and Network Formation

Arxiv

0+阅读 · 2022年6月10日

AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models

Arxiv

0+阅读 · 2022年6月10日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【普林斯顿博士论文】在线学习：优化、控制与学习理论

不确定环境下无人机三维路径规划研究 | 221页

【NeurIPS2025】《LeapFactual：基于条件流匹配的可靠视觉反事实解释》

大语言模型将如何改变军事指挥结构

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

HICEM: A High-Coverage Emotion Model for Artificial Emotional Intelligence

Arxiv

0+阅读 · 2022年6月15日

HyperPrompt: Prompt-based Task-Conditioning of Transformers

Arxiv

0+阅读 · 2022年6月15日

Automated Detection of Typed Links in Issue Trackers

Arxiv

0+阅读 · 2022年6月14日

Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure

Arxiv

0+阅读 · 2022年6月13日

Sequential asset ranking in nonstationary time series

Arxiv

0+阅读 · 2022年6月11日

Mixed Logit Models and Network Formation

Arxiv

0+阅读 · 2022年6月10日

AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models

Arxiv

0+阅读 · 2022年6月10日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

lnc-CENPQ-2在颞叶内侧型癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

循环干扰信道的容量和高效编码传输方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

南疆干旱区土壤盐渍化的高光谱响应与定量反演模型构建研究

国家自然科学基金

0+阅读 · 2012年12月31日

60GHz及Q波段CMOS功率放大器增益增强与片上功率合成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

非高斯环境噪声下的高效自适应无线传感器网络定位算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

认知无线电系统可调多波段多模式高效射频功率放大器研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Rayleigh信道统计分析和建模

国家自然科学基金

0+阅读 · 2009年12月31日

高维问题和稳健性研究

国家自然科学基金

0+阅读 · 2009年12月31日

P450亚型酶对土壤典型污染物毒性响应及致毒作用机理

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员