使用 SVD 进行专题建模 (Using SVD for Topic Modeling) - 专知论文

会员服务 ·

0

话题模型 · 奇异值分解 · 估计/估计量 · 奇异的 · 话题 ·

2022 年 8 月 29 日

Using SVD for Topic Modeling

翻译：使用 SVD 进行专题建模

Zheng Tracy Ke,Minzhe Wang

from arxiv, 100 pages, 9 figures, 3 tables

The probabilistic topic model imposes a low-rank structure on the expectation of the corpus matrix. Therefore, singular value decomposition (SVD) is a natural tool of dimension reduction. We propose an SVD-based method for estimating a topic model. Our method constructs an estimate of the topic matrix from only a few leading singular vectors of the corpus matrix, and has a great advantage in memory use and computational cost for large-scale corpora. The core ideas behind our method include a pre-SVD normalization to tackle severe word frequency heterogeneity, a post-SVD normalization to create a low-dimensional word embedding that manifests a simplex geometry, and a post-SVD procedure to construct an estimate of the topic matrix directly from the embedded word cloud. We provide the explicit rate of convergence of our method. We show that our method attains the optimal rate in the case of long and moderately long documents, and it improves the rates of existing methods in the case of short documents. The key of our analysis is a sharp row-wise large-deviation bound for empirical singular vectors, which is technically demanding to derive and potentially useful for other problems. We apply our method to a corpus of Associated Press news articles and a corpus of abstracts of statistical papers.

翻译：因此,单值分解(SVD)是减少维度的自然工具。我们建议采用基于SVD的模型来估计一个专题模型。我们的方法只从本体矩阵的几个主要单一矢量中估算了专题矩阵,在大型公司使用的记忆使用和计算成本方面有很大优势。我们方法的核心思想包括:SVD前的正常化,以解决严重字频异性、SVD后正常化,以创建低维字嵌入显示简单x几何法的低维字,以及SVD后程序,直接从嵌入的云层中估算专题矩阵。我们提供了我们方法的明确趋同率。我们表明,我们的方法在长中和中长文件的记忆使用和计算成本方面达到了最佳率,在短文件的情况下提高了现有方法的速率。我们分析的关键是直径大缩缩放,用于实验性单位矢量图像,从技术上要求我们的数据序列中采用其他统计工具。

0

相关内容

话题模型

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

专知会员服务

12+阅读 · 2022年3月14日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

磁弹性问题的各向异性有限元误差估计

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

时频光热多源信息融合的非均匀弥散介质多宗量场反演研究

国家自然科学基金

0+阅读 · 2014年12月31日

气溶胶高值区短波红外CO2卫星遥感反演算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于特异性识别多肽的循环肿瘤细胞检测新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

偶应力/应变梯度理论的精化不协调元方法

国家自然科学基金

0+阅读 · 2012年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

基于小波有限元的探地雷达正演模拟及偏移处理

国家自然科学基金

0+阅读 · 2008年12月31日

Warped Dynamic Linear Models for Time Series of Counts

Arxiv

0+阅读 · 2022年10月18日

Distributed Inference over Linear Models using Alternating Gaussian Belief Propagation

Arxiv

0+阅读 · 2022年10月18日

Waveform inversion via reduced order modeling

Arxiv

0+阅读 · 2022年10月16日

Coordinated Topic Modeling

Arxiv

0+阅读 · 2022年10月16日

M2D2: A Massively Multi-domain Language Modeling Dataset

Arxiv

0+阅读 · 2022年10月13日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks

Arxiv

36+阅读 · 2020年5月24日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Arxiv

12+阅读 · 2020年4月15日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

VIP会员

文章信息

相关主题

奇异值分解

估计/估计量

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

专知会员服务

12+阅读 · 2022年3月14日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Warped Dynamic Linear Models for Time Series of Counts

Arxiv

0+阅读 · 2022年10月18日

Distributed Inference over Linear Models using Alternating Gaussian Belief Propagation

Arxiv

0+阅读 · 2022年10月18日

Waveform inversion via reduced order modeling

Arxiv

0+阅读 · 2022年10月16日

Coordinated Topic Modeling

Arxiv

0+阅读 · 2022年10月16日

M2D2: A Massively Multi-domain Language Modeling Dataset

Arxiv

0+阅读 · 2022年10月13日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks

Arxiv

36+阅读 · 2020年5月24日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Arxiv

12+阅读 · 2020年4月15日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

相关基金

磁弹性问题的各向异性有限元误差估计

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

时频光热多源信息融合的非均匀弥散介质多宗量场反演研究

国家自然科学基金

0+阅读 · 2014年12月31日

气溶胶高值区短波红外CO2卫星遥感反演算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于特异性识别多肽的循环肿瘤细胞检测新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

偶应力/应变梯度理论的精化不协调元方法

国家自然科学基金

0+阅读 · 2012年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

基于小波有限元的探地雷达正演模拟及偏移处理

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员