多语种主题模式 (Multilingual Topic Models) - 专知论文

会员服务 ·

0

话题模型 · MoDELS · 词表 · 潜变量/隐变量 · Information Sciences ·

2017 年 12 月 18 日

Multilingual Topic Models

翻译：多语种主题模式

Kriste Krstovski,Michael J. Kurtz,David A. Smith,Alberto Accomazzi

from arxiv, 18 pages, 9 figures

Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document representation schemes possess different cost-benefit tradeoffs. In this paper, we propose to model different representations of the same article as translations of each other, all generated from a common latent representation in a multilingual topic model. We start with a methodological overview on latent variable models for parallel document representations that could be used across many information science tasks. We then show how solving the inference problem of mapping diverse representations into a shared topic space allows us to evaluate representations based on how topically similar they are to the original article. In addition, our proposed approach provides means to discover where different concept vocabularies require improvement.

翻译：科学出版物在索引编制、检索和计算各条款之间的相似性时,为减少词汇不匹配而发展了几个特征,这些减缓战略从仅仅侧重于诸如标题和摘要等高价值物品章节到指定关键词,通常是从受控制的词汇中人工或通过自动注解,各种文件表述计划具有不同的成本效益取舍。在本文件中,我们提议将同一条款作为翻译品的不同表述方式建模,所有这些都来自多语种专题模型中共同的潜在代表物。我们首先从方法上概述潜在的可变模型开始,这些模型可以用于许多信息科学任务。然后我们展示如何解决将不同表述图解绘制成一个共同主题空间的推论问题,从而使我们能够根据它们与原始条款在专题上如何相似来评价表述。此外,我们提议的方法提供了方法,以发现不同概念词汇需要改进的地方。

3

相关内容

话题模型

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

AAAI 2020 | 滴滴自主提出基于注意力机制的异构图神经网络模型

专知会员服务

53+阅读 · 2020年2月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

AI研习社

13+阅读 · 2018年8月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Dynamic and Static Topic Model for Analyzing Time-Series Document Collections

Arxiv

8+阅读 · 2018年5月6日

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Arxiv

4+阅读 · 2018年4月26日

Scalable Generalized Dynamic Topic Models

Arxiv

7+阅读 · 2018年3月21日

Application of Rényi and Tsallis Entropies to Topic Modeling Optimization

Arxiv

6+阅读 · 2018年2月28日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Learning Topic Models by Neighborhood Aggregation

Arxiv

3+阅读 · 2018年2月22日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

Knowledge-based Word Sense Disambiguation using Topic Models

Arxiv

5+阅读 · 2018年1月5日

Continuous Time Dynamic Topic Models

Arxiv

3+阅读 · 2015年5月16日

VIP会员

文章信息

相关主题

潜变量/隐变量

Information Sciences

相关VIP内容

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

AAAI 2020 | 滴滴自主提出基于注意力机制的异构图神经网络模型

专知会员服务

53+阅读 · 2020年2月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

AI研习社

13+阅读 · 2018年8月24日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Dynamic and Static Topic Model for Analyzing Time-Series Document Collections

Arxiv

8+阅读 · 2018年5月6日

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Arxiv

4+阅读 · 2018年4月26日

Scalable Generalized Dynamic Topic Models

Arxiv

7+阅读 · 2018年3月21日

Application of Rényi and Tsallis Entropies to Topic Modeling Optimization

Arxiv

6+阅读 · 2018年2月28日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Learning Topic Models by Neighborhood Aggregation

Arxiv

3+阅读 · 2018年2月22日

Discrete Autoencoders for Sequence Models

Arxiv

6+阅读 · 2018年1月29日

Knowledge-based Word Sense Disambiguation using Topic Models

Arxiv

5+阅读 · 2018年1月5日

Continuous Time Dynamic Topic Models

Arxiv

3+阅读 · 2015年5月16日

微信扫码咨询专知VIP会员