多视角嵌入空间多源神经专题建模 (Multi-source Neural Topic Modeling in Multi-view Embedding Spaces) - 专知论文

会员服务 ·

0

话题模型 · MoDELS · 词向量表示 · 话题 · 一词多义性 ·

2021 年 4 月 17 日

Multi-source Neural Topic Modeling in Multi-view Embedding Spaces

翻译：多视角嵌入空间多源神经专题建模

Pankaj Gupta,Yatin Chaudhary,Hinrich Schütze

from arxiv, NAACL2021, 13 pages, 14 tables, 2 figures. arXiv admin note: substantial text overlap with arXiv:1909.06563

Though word embeddings and topics are complementary representations, several past works have only used pretrained word embeddings in (neural) topic modeling to address data sparsity in short-text or small collection of documents. This work presents a novel neural topic modeling framework using multi-view embedding spaces: (1) pretrained topic-embeddings, and (2) pretrained word-embeddings (context insensitive from Glove and context-sensitive from BERT models) jointly from one or many sources to improve topic quality and better deal with polysemy. In doing so, we first build respective pools of pretrained topic (i.e., TopicPool) and word embeddings (i.e., WordPool). We then identify one or more relevant source domain(s) and transfer knowledge to guide meaningful learning in the sparse target domain. Within neural topic modeling, we quantify the quality of topics and document representations via generalization (perplexity), interpretability (topic coherence) and information retrieval (IR) using short-text, long-text, small and large document collections from news and medical domains. Introducing the multi-source multi-view embedding spaces, we have shown state-of-the-art neural topic modeling using 6 source (high-resource) and 5 target (low-resource) corpora.

翻译：尽管字嵌入和专题是互补的表述,但过去的一些著作只使用了(神经)专题模型中预先训练的字嵌入,以解决短文本或小文件收集中的数据宽度问题。这项工作提出了使用多视图嵌入空间的新颖神经专题模型框架:(1) 预先训练的专题编入,和(2) 预先训练的字编成(Glove的文体不敏感,BERT模型对背景敏感),从一个或多个来源联合进行,以提高专题质量,更好地处理多元问题。在这样做的时候,我们首先建立各种预先训练的专题(即Top Pool)和字嵌入(即Word Pool)的集合。然后我们确定一个或多个相关的源域,并转让知识,以指导稀少的目标域的有意义的学习。在神经模型中,我们用短文本、长文本、小型和大型文件库集成(IR)来量化专题和文件表述的质量,使用短文本、长文本、小、大文本以及来自新闻和医学领域的嵌入软件库,我们展示了多源数据库5号数据库。

0

相关内容

话题模型

【WWW2021】实体自适应语义依赖图立场检测

专知会员服务

22+阅读 · 2021年4月15日

【CIKM2020】神经逻辑推理，Neural Logic Reasoning

【CIKM2020】神经逻辑推理，Neural Logic Reasoning

专知会员服务

51+阅读 · 2020年8月25日

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

专知会员服务

59+阅读 · 2020年6月30日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

【论文】多关系庞加莱图嵌入（Multi-relational Poincaré Graph Embeddings），爱丁堡大学| Ivana Balažević

【论文】多关系庞加莱图嵌入（Multi-relational Poincaré Graph Embeddings），爱丁堡大学| Ivana Balažević

专知会员服务

59+阅读 · 2019年12月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

AI科技评论

6+阅读 · 2018年1月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

原创 | Attention Modeling for Targeted Sentiment

原创 | Attention Modeling for Targeted Sentiment

黑龙江大学自然语言处理实验室

25+阅读 · 2017年11月5日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

Transfer Learning for Future Wireless Networks: A Comprehensive Survey

Arxiv

0+阅读 · 2021年8月8日

Task Programming: Learning Data Efficient Behavior Representations

Arxiv

4+阅读 · 2021年3月29日

A Survey on Contextual Embeddings

Arxiv

29+阅读 · 2020年3月16日

Improving Textual Network Embedding with Global Attention via Optimal Transport

Improving Textual Network Embedding with Global Attention via Optimal Transport

Arxiv

3+阅读 · 2019年6月5日

Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network

Arxiv

15+阅读 · 2019年5月28日

Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces

Arxiv

3+阅读 · 2018年4月9日

TransRev: Modeling Reviews as Translations from Users to Items

Arxiv

9+阅读 · 2018年1月30日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

mvn2vec: Preservation and Collaboration in Multi-View Network Embedding

Arxiv

10+阅读 · 2018年1月19日

Topic Compositional Neural Language Model

Arxiv

5+阅读 · 2017年12月29日

VIP会员

文章信息

相关主题

词向量表示

一词多义性

相关VIP内容

【WWW2021】实体自适应语义依赖图立场检测

专知会员服务

22+阅读 · 2021年4月15日

【CIKM2020】神经逻辑推理，Neural Logic Reasoning

【CIKM2020】神经逻辑推理，Neural Logic Reasoning

专知会员服务

51+阅读 · 2020年8月25日

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

专知会员服务

59+阅读 · 2020年6月30日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

【论文】多关系庞加莱图嵌入（Multi-relational Poincaré Graph Embeddings），爱丁堡大学| Ivana Balažević

【论文】多关系庞加莱图嵌入（Multi-relational Poincaré Graph Embeddings），爱丁堡大学| Ivana Balažević

专知会员服务

59+阅读 · 2019年12月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《巡飞弹药（爆炸性无人机）威胁态势分析》最新24页报告

《军用后勤无人机：破解战场运输挑战的创新方案》

人工智能战争：以色列、伊朗与新型AI战争形态

《俄乌战争：现代战争未来的启示与经验》

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

25篇AAAI 2018接收论文在哈工大直播预讲，顶会预先看！

AI科技评论

6+阅读 · 2018年1月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

原创 | Attention Modeling for Targeted Sentiment

原创 | Attention Modeling for Targeted Sentiment

黑龙江大学自然语言处理实验室

25+阅读 · 2017年11月5日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

相关论文

Transfer Learning for Future Wireless Networks: A Comprehensive Survey

Arxiv

0+阅读 · 2021年8月8日

Task Programming: Learning Data Efficient Behavior Representations

Arxiv

4+阅读 · 2021年3月29日

A Survey on Contextual Embeddings

Arxiv

29+阅读 · 2020年3月16日

Improving Textual Network Embedding with Global Attention via Optimal Transport

Improving Textual Network Embedding with Global Attention via Optimal Transport

Arxiv

3+阅读 · 2019年6月5日

Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network

Arxiv

15+阅读 · 2019年5月28日

Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces

Arxiv

3+阅读 · 2018年4月9日

TransRev: Modeling Reviews as Translations from Users to Items

Arxiv

9+阅读 · 2018年1月30日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

mvn2vec: Preservation and Collaboration in Multi-View Network Embedding

Arxiv

10+阅读 · 2018年1月19日

Topic Compositional Neural Language Model

Arxiv

5+阅读 · 2017年12月29日

微信扫码咨询专知VIP会员