TliltedBERT: 可调整资源版本的BERT (TiltedBERT: Resource Adjustable Version of BERT) - 专知论文

会员服务 ·

0

Analysis · Extensibility · Less · 向量化 · 推断 ·

2022 年 6 月 21 日

TiltedBERT: Resource Adjustable Version of BERT

翻译：TliltedBERT: 可调整资源版本的BERT

Sajjad Kachuee,Mohammad Sharifkhani

In this paper, a novel adjustable fine-tuning method is proposed that improves the inference time of BERT model on downstream tasks. The proposed method detects the more important word vectors in each layer by the proposed Attention Context Contribution (ACC) metric and eliminates the less important word vectors by the proposed strategy. In the TiltedBERT method the model learns to work with a considerably lower number of Floating Point Operations (FLOPs) than the original BERTbase model. The proposed method does not need training from scratch, and it can be generalized to other transformer-based models. The extensive experiments show that the word vectors in higher layers have less contribution that can be eliminated and improve the inference time. Experimental results on extensive sentiment analysis, classification and regression datasets, and benchmarks like IMDB and GLUE showed that TiltedBERT is effective in various datasets. TiltedBERT improves the inference time of BERTbase up to 4.8 times with less than 0.75% accuracy drop on average. After the fine-tuning by the offline-tuning property, the inference time of the model can be adjusted for a wide range of Tilt-Rate selections. Also, A mathematical speedup analysis is proposed to estimate TiltedBERT method's speedup accurately. With the help of this analysis, a proper Tilt-Rate value can be selected before fine-tuning and during offline-tuning phases.

翻译：在本文中,提出了一种新颖的可调整微调方法,改进BERT模型在下游任务下游任务的推算时间。建议的方法通过拟议的注意环境贡献(ACAC)衡量标准,发现每个层次中更重要的字矢量,并消除拟议战略中较不重要的字矢量。在TiledBERTET方法中,模型学会与比原始的BERT数据库模型少得多的浮动点操作(Fleops)数量(Fleops)一起工作。拟议的方法不需要从零开始培训,它可以推广到其他基于变压器的模型。广泛的实验实验表明,较高层的字矢量对每个层次中更重要的字矢量的贡献较少,可以消除,并改进推导出的时间。在广泛的情绪分析、分类和回归数据集以及IMDB和GLUE等基准的实验结果显示,TiltetedBERT在不同的数据集中有效。TiltebERT改进了BERBBBBBBest在平均时间上升到4.8次之前的推推导时间,低于0.75的精度下降。在调整后,在Timal 分析中,在拟议的TILA 分析中,可以调整后,在拟议的货币分析中进行正确的推展后,可以调整后,在计算。

0

相关内容

Analysis

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

受Mittag-Lef？er噪声激励的广义朗之万方程的随机共振研究

国家自然科学基金

0+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

纠缠及纠缠之外的量子关联刻画

国家自然科学基金

0+阅读 · 2013年12月31日

汽车实时嵌入式系统中的软件健康监控技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

催化释氢导向的多元合金/镶嵌结构高活性晶面选择性合成及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模定制服务系统的Petri网语义模型与关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

沉默基因Siglec-5的再激活及其抑制T细胞活化的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

瞬态冲击信号的Hilbert时频谱表征方法与应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

无单元Galerkin方法的改进及其误差估计理论

国家自然科学基金

0+阅读 · 2008年12月31日

Exploring Hate Speech Detection with HateXplain and BERT

Arxiv

0+阅读 · 2022年8月9日

Gaze Estimation Approach Using Deep Differential Residual Network

Arxiv

0+阅读 · 2022年8月8日

Robustness of Model Predictions under Extension

Arxiv

0+阅读 · 2022年8月8日

On the Parameterization and Initialization of Diagonal State Space Models

On the Parameterization and Initialization of Diagonal State Space Models

Arxiv

0+阅读 · 2022年8月5日

SERCNN: Stacked Embedding Recurrent Convolutional Neural Network in Detecting Depression on Twitter

Arxiv

0+阅读 · 2022年8月5日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

VIP会员

文章信息

相关主题

相关VIP内容

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

相关论文

Exploring Hate Speech Detection with HateXplain and BERT

Arxiv

0+阅读 · 2022年8月9日

Gaze Estimation Approach Using Deep Differential Residual Network

Arxiv

0+阅读 · 2022年8月8日

Robustness of Model Predictions under Extension

Arxiv

0+阅读 · 2022年8月8日

On the Parameterization and Initialization of Diagonal State Space Models

On the Parameterization and Initialization of Diagonal State Space Models

Arxiv

0+阅读 · 2022年8月5日

SERCNN: Stacked Embedding Recurrent Convolutional Neural Network in Detecting Depression on Twitter

Arxiv

0+阅读 · 2022年8月5日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

16+阅读 · 2019年5月24日

相关基金

受Mittag-Lef？er噪声激励的广义朗之万方程的随机共振研究

国家自然科学基金

0+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

纠缠及纠缠之外的量子关联刻画

国家自然科学基金

0+阅读 · 2013年12月31日

汽车实时嵌入式系统中的软件健康监控技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

催化释氢导向的多元合金/镶嵌结构高活性晶面选择性合成及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模定制服务系统的Petri网语义模型与关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

沉默基因Siglec-5的再激活及其抑制T细胞活化的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

瞬态冲击信号的Hilbert时频谱表征方法与应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

无单元Galerkin方法的改进及其误差估计理论

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员