Hash Formers:迈向词汇独立、训练前先行变换器 (HashFormers: Towards Vocabulary-independent Pre-trained Transformers) - 专知论文

会员服务 ·

0

变换 · 词元分析器 · Performer · 词表 · 哈希学习 ·

2022 年 10 月 14 日

HashFormers: Towards Vocabulary-independent Pre-trained Transformers

翻译：Hash Formers:迈向词汇独立、训练前先行变换器

Huiyin Xue,Nikolaos Aletras

from arxiv, Accepted at EMNLP 2022

Transformer-based pre-trained language models are vocabulary-dependent, mapping by default each token to its corresponding embedding. This one-to-one mapping results into embedding matrices that occupy a lot of memory (i.e. millions of parameters) and grow linearly with the size of the vocabulary. Previous work on on-device transformers dynamically generate token embeddings on-the-fly without embedding matrices using locality-sensitive hashing over morphological information. These embeddings are subsequently fed into transformer layers for text classification. However, these methods are not pre-trained. Inspired by this line of work, we propose HashFormers, a new family of vocabulary-independent pre-trained transformers that support an unlimited vocabulary (i.e. all possible tokens in a corpus) given a substantially smaller fixed-sized embedding matrix. We achieve this by first introducing computationally cheap hashing functions that bucket together individual tokens to embeddings. We also propose three variants that do not require an embedding matrix at all, further reducing the memory requirements. We empirically demonstrate that HashFormers are more memory efficient compared to standard pre-trained transformers while achieving comparable predictive performance when fine-tuned on multiple text classification tasks. For example, our most efficient HashFormer variant has a negligible performance degradation (0.4\% on GLUE) using only 99.1K parameters for representing the embeddings compared to 12.3-38M parameters of state-of-the-art models.

翻译：以变异器为基础的预训练语言模型是依赖词汇的, 默认每个符号都映射到相应的嵌入中。这种一对一的映射结果将包含大量记忆的嵌入矩阵( 即百万参数), 并随着词汇的大小而成线增长。之前关于设备变异器的工作动态地生成在现场的象征性嵌入, 而不嵌入矩阵, 使用对位置敏感的杂质, 而不是形态学信息。这些嵌入随后被输入到变异器层中进行文本分类。然而, 这些方法没有预先训练。受这一行的启发, 我们建议HashFormers, 这是一种新的基于词汇( 即百万参数) 的预培训前变异器组合, 支持一个无限的词汇( 即, 体积中所有可能的符号) 。我们首先采用计算便宜的散装功能, 将单个符号混在一起进行嵌入。我们还建议三种不需要嵌入矩阵, 进一步减少记忆要求。我们实验性地证明, 质化的HashForforformorsFors 将最高效的缩缩缩缩缩缩缩缩缩变缩模型, 与升级的图像相比, 性变缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩的动作,, 将运行的缩缩缩缩缩缩缩缩缩缩的动作将动作将动作将动作将动作将动作比成成为S- 性动作将动作比成为SBL 。

0

相关内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

51+阅读 · 2022年10月22日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

高维单调转移模型的变量选择及其在违约风险评估中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

Kahler 曲面中特殊曲面的研究

国家自然科学基金

0+阅读 · 2014年12月31日

大尺度量子线路模型的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于双资源配置的多阶段风险投资组合模型及优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIGHT增强IR加Veliparib诱导的衰老肿瘤细胞疫苗的抗肿瘤作用

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

广义逆的表示、扰动和光滑分析

国家自然科学基金

0+阅读 · 2011年12月31日

黄土中生物炭对重金属的吸附-固定化作用和生物有效性影响及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

γ948;T细胞识别的结核杆菌蛋白抗原的筛选与鉴定

国家自然科学基金

0+阅读 · 2009年12月31日

Scaling Native Language Identification with Transformer Adapters

Arxiv

0+阅读 · 2022年11月18日

Efficient Transformers with Dynamic Token Pooling

Arxiv

0+阅读 · 2022年11月17日

Zero-Shot Dynamic Quantization for Transformer Inference

Arxiv

0+阅读 · 2022年11月17日

Multilevel Transformer For Multimodal Emotion Recognition

Arxiv

0+阅读 · 2022年11月16日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication

Arxiv

17+阅读 · 2021年6月2日

Poolingformer: Long Document Modeling with Pooling Attention

Arxiv

14+阅读 · 2021年5月10日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

51+阅读 · 2022年10月22日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

相关论文

Scaling Native Language Identification with Transformer Adapters

Arxiv

0+阅读 · 2022年11月18日

Efficient Transformers with Dynamic Token Pooling

Arxiv

0+阅读 · 2022年11月17日

Zero-Shot Dynamic Quantization for Transformer Inference

Arxiv

0+阅读 · 2022年11月17日

Multilevel Transformer For Multimodal Emotion Recognition

Arxiv

0+阅读 · 2022年11月16日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication

Arxiv

17+阅读 · 2021年6月2日

Poolingformer: Long Document Modeling with Pooling Attention

Arxiv

14+阅读 · 2021年5月10日

相关基金

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

高维单调转移模型的变量选择及其在违约风险评估中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

Kahler 曲面中特殊曲面的研究

国家自然科学基金

0+阅读 · 2014年12月31日

大尺度量子线路模型的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于双资源配置的多阶段风险投资组合模型及优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIGHT增强IR加Veliparib诱导的衰老肿瘤细胞疫苗的抗肿瘤作用

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

广义逆的表示、扰动和光滑分析

国家自然科学基金

0+阅读 · 2011年12月31日

黄土中生物炭对重金属的吸附-固定化作用和生物有效性影响及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

γ948;T细胞识别的结核杆菌蛋白抗原的筛选与鉴定

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员