BERT知识蒸馏的词义感知方法 (Word Sense Induction with Knowledge Distillation from BERT) - 专知论文

会员服务 ·

0

词嵌入 · 嵌入 · BERT · 上下文 · 知识蒸馏 ·

2023 年 4 月 20 日

Word Sense Induction with Knowledge Distillation from BERT

翻译：BERT知识蒸馏的词义感知方法

Anik Saha,Alex Gittens,Bulent Yener

Pre-trained contextual language models are ubiquitously employed for language understanding tasks, but are unsuitable for resource-constrained systems. Noncontextual word embeddings are an efficient alternative in these settings. Such methods typically use one vector to encode multiple different meanings of a word, and incur errors due to polysemy. This paper proposes a two-stage method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context and transferring this sense information to fit multi-sense embeddings in a skip-gram-like framework. We demonstrate an effective approach to training the sense disambiguation mechanism in our model with a distribution over word senses extracted from the output layer embeddings of BERT. Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings on multiple benchmark data sets, and experiments with an embedding-based topic model (ETM) demonstrates the benefits of using this multi-sense embedding in a downstream application.

翻译：预先训练的语境化语言模型普遍用于语言理解任务，但在资源受限的系统中不太适用。非语境词向量是这些情况下的一种高效替代方法。这种方法通常使用一个向量来编码一个词的多个不同含义，并由于词义的多义性而产生误差。本文提出了一种两阶段方法，以从预训练语言模型（BERT）中提炼出多个词义，并通过使用上下文中一个单词的各种意义之间的注意力，将该词义信息转移到类似于跳字模型框架中的多义词嵌入中。我们演示了一种有效的方法，用于从BERT的输出层嵌入中提取的词义分布中训练我们模型中的词义消歧机制。在上下文相似性和词义感知任务的实验中，该方法在多个基准数据集上优于或与最先进的多义词嵌入相竞争，并且使用基于嵌入的主题模型（ETM）的实验展示了在下游应用中使用这种多义词嵌入的好处。

1

相关内容

词嵌入

【CVPR2022】基于知识蒸馏的高效预训练

【CVPR2022】基于知识蒸馏的高效预训练

专知会员服务

32+阅读 · 2022年4月23日

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

专知会员服务

43+阅读 · 2020年11月22日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

专知会员服务

24+阅读 · 2019年11月20日

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

专知会员服务

31+阅读 · 2019年11月17日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

CIKM 2022最佳论文：融合图注意力机制与预训练语言模型的常识库补全

CIKM 2022最佳论文：融合图注意力机制与预训练语言模型的常识库补全

PaperWeekly

0+阅读 · 2022年11月3日

一文读懂最强中文NLP预训练模型ERNIE

一文读懂最强中文NLP预训练模型ERNIE

AINLP

25+阅读 · 2019年10月22日

基于知识蒸馏的BERT模型压缩

基于知识蒸馏的BERT模型压缩

大数据文摘

18+阅读 · 2019年10月14日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

论文浅尝 | 利用 KG Embedding 进行问题回答

论文浅尝 | 利用 KG Embedding 进行问题回答

开放知识图谱

22+阅读 · 2019年7月7日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

NLP预训练模型大集合！

NLP预训练模型大集合！

机器之心

21+阅读 · 2018年12月28日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

机会物联网基于命名数据网络的信息检索机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

先进卫星导航系统信号互相关干扰特性分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Tandem型染敏太阳电池p型光阴极准一维微纳结构调控

国家自然科学基金

0+阅读 · 2012年12月31日

Sn基无铅焊料电迁移的各向异性研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于半导体纳米结构的波长转换技术在硅基太阳电池上的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于振动和声频信号HHT特征提取的高速列车轨道伤损探测方法研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于语言理解的机器翻译方法研究

国家自然科学基金

2+阅读 · 2009年12月31日

语义Web的链接结构分析方法与技术

国家自然科学基金

0+阅读 · 2009年12月31日

基于实例动态泛化的共指消解

国家自然科学基金

0+阅读 · 2009年12月31日

Leveraging Knowledge Graph Embeddings to Enhance Contextual Representations for Relation Extraction

Arxiv

0+阅读 · 2023年6月7日

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Arxiv

0+阅读 · 2023年6月6日

Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Arxiv

0+阅读 · 2023年6月6日

Commonsense Knowledge Transfer for Pre-trained Language Models

Arxiv

0+阅读 · 2023年6月4日

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

Arxiv

0+阅读 · 2023年6月3日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

KG-BERT: BERT for Knowledge Graph Completion

Arxiv

20+阅读 · 2019年9月7日

Multi-view Knowledge Graph Embedding for Entity Alignment

Arxiv

36+阅读 · 2019年6月6日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2022】基于知识蒸馏的高效预训练

【CVPR2022】基于知识蒸馏的高效预训练

专知会员服务

32+阅读 · 2022年4月23日

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

专知会员服务

43+阅读 · 2020年11月22日

【NeurIPS 2020】融入BERT到并行序列模型

【NeurIPS 2020】融入BERT到并行序列模型

专知会员服务

26+阅读 · 2020年10月15日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

专知会员服务

24+阅读 · 2019年11月20日

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

专知会员服务

31+阅读 · 2019年11月17日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

CIKM 2022最佳论文：融合图注意力机制与预训练语言模型的常识库补全

CIKM 2022最佳论文：融合图注意力机制与预训练语言模型的常识库补全

PaperWeekly

0+阅读 · 2022年11月3日

一文读懂最强中文NLP预训练模型ERNIE

一文读懂最强中文NLP预训练模型ERNIE

AINLP

25+阅读 · 2019年10月22日

基于知识蒸馏的BERT模型压缩

基于知识蒸馏的BERT模型压缩

大数据文摘

18+阅读 · 2019年10月14日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

论文浅尝 | 利用 KG Embedding 进行问题回答

论文浅尝 | 利用 KG Embedding 进行问题回答

开放知识图谱

22+阅读 · 2019年7月7日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

NLP预训练模型大集合！

NLP预训练模型大集合！

机器之心

21+阅读 · 2018年12月28日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

相关论文

Leveraging Knowledge Graph Embeddings to Enhance Contextual Representations for Relation Extraction

Arxiv

0+阅读 · 2023年6月7日

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Arxiv

0+阅读 · 2023年6月6日

Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Arxiv

0+阅读 · 2023年6月6日

Commonsense Knowledge Transfer for Pre-trained Language Models

Arxiv

0+阅读 · 2023年6月4日

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

Arxiv

0+阅读 · 2023年6月3日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

KG-BERT: BERT for Knowledge Graph Completion

Arxiv

20+阅读 · 2019年9月7日

Multi-view Knowledge Graph Embedding for Entity Alignment

Arxiv

36+阅读 · 2019年6月6日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

相关基金

机会物联网基于命名数据网络的信息检索机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

先进卫星导航系统信号互相关干扰特性分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Tandem型染敏太阳电池p型光阴极准一维微纳结构调控

国家自然科学基金

0+阅读 · 2012年12月31日

Sn基无铅焊料电迁移的各向异性研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于半导体纳米结构的波长转换技术在硅基太阳电池上的应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于振动和声频信号HHT特征提取的高速列车轨道伤损探测方法研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于语言理解的机器翻译方法研究

国家自然科学基金

2+阅读 · 2009年12月31日

语义Web的链接结构分析方法与技术

国家自然科学基金

0+阅读 · 2009年12月31日

基于实例动态泛化的共指消解

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员