作者短文本的分组和专题估算 (Author Clustering and Topic Estimation for Short Texts) - 专知论文

会员服务 ·

0

估计/估计量 · 簇 · MoDELS · 话题 · 潜在狄利克雷分配 ·

2021 年 6 月 15 日

Author Clustering and Topic Estimation for Short Texts

翻译：作者短文本的分组和专题估算

Graham Tierney,Christopher Bail,Alexander Volfovsky

Analysis of short text, such as social media posts, is extremely difficult because it relies on observing many document-level word co-occurrence pairs. Beyond topic distributions, a common downstream task of the modeling is grouping the authors of these documents for subsequent analyses. Traditional models estimate the document groupings and identify user clusters with an independent procedure. We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document, with user-level topic distributions. We also simultaneously cluster users, removing the need for post-hoc cluster estimation and improving topic estimation by shrinking noisy user-level topic distributions towards typical values. Our method performs as well as -- or better -- than traditional approaches to problems arising in short text, and we demonstrate its usefulness on a dataset of tweets from United States Senators, recovering both meaningful topics and clusters that reflect partisan ideology.

翻译：社交媒体文章等短文分析极其困难,因为它依赖于观察许多文件级的单词共发。除了专题分布外,模型的普通下游任务是将这些文件的作者分组,以便随后进行分析。传统模型估计文件分组,用独立程序确定用户群。我们提出了一个新颖的模式,通过在同一文件中的文字之间建模高度依赖来扩大 " 冷冻点分配 ",同时使用用户级专题分布。我们还同时对用户进行分组,通过减少用户级的吵闹主题分布来缩小对典型价值观的估算和改进专题估计。我们的方法比处理短文本中出现的问题的传统方法还要好,我们用美国参议员的推文数据集来显示其有用性,既恢复有意义的专题,又恢复反映党派意识形态的群集。

0

相关内容

估计/估计量

估计/估计量

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

Classic Clustering Algorithms to Live By [ 熊辉，罗格斯－新泽西州立大学教授] 2019年中国计算机大会计算机经典算法回顾与展望——机器学习与数据挖掘论坛

Classic Clustering Algorithms to Live By [ 熊辉，罗格斯－新泽西州立大学教授] 2019年中国计算机大会计算机经典算法回顾与展望——机器学习与数据挖掘论坛

专知会员服务

10+阅读 · 2019年10月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

3+阅读 · 2018年10月11日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Probability Estimation of Uncertain Process Traces

Probability Estimation of Uncertain Process Traces

Arxiv

0+阅读 · 2021年8月19日

Effective and scalable clustering of SARS-CoV-2 sequences

Effective and scalable clustering of SARS-CoV-2 sequences

Arxiv

0+阅读 · 2021年8月18日

Viewpoint Planning for Fruit Size and Position Estimation

Arxiv

0+阅读 · 2021年8月18日

Two-step growth mixture model to examine heterogeneity in nonlinear trajectories

Arxiv

0+阅读 · 2021年8月17日

An NLP approach to quantify dynamic salience of predefined topics in a text corpus

Arxiv

1+阅读 · 2021年8月16日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

Arxiv

4+阅读 · 2018年12月22日

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Arxiv

4+阅读 · 2018年8月24日

Coarse-to-fine Seam Estimation for Image Stitching

Arxiv

4+阅读 · 2018年5月24日

Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering

Arxiv

5+阅读 · 2018年4月9日

VIP会员

文章信息

相关主题

估计/估计量

潜在狄利克雷分配

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

Classic Clustering Algorithms to Live By [ 熊辉，罗格斯－新泽西州立大学教授] 2019年中国计算机大会计算机经典算法回顾与展望——机器学习与数据挖掘论坛

Classic Clustering Algorithms to Live By [ 熊辉，罗格斯－新泽西州立大学教授] 2019年中国计算机大会计算机经典算法回顾与展望——机器学习与数据挖掘论坛

专知会员服务

10+阅读 · 2019年10月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

3+阅读 · 2018年10月11日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Probability Estimation of Uncertain Process Traces

Probability Estimation of Uncertain Process Traces

Arxiv

0+阅读 · 2021年8月19日

Effective and scalable clustering of SARS-CoV-2 sequences

Effective and scalable clustering of SARS-CoV-2 sequences

Arxiv

0+阅读 · 2021年8月18日

Viewpoint Planning for Fruit Size and Position Estimation

Arxiv

0+阅读 · 2021年8月18日

Two-step growth mixture model to examine heterogeneity in nonlinear trajectories

Arxiv

0+阅读 · 2021年8月17日

An NLP approach to quantify dynamic salience of predefined topics in a text corpus

Arxiv

1+阅读 · 2021年8月16日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

Arxiv

4+阅读 · 2018年12月22日

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Arxiv

4+阅读 · 2018年8月24日

Coarse-to-fine Seam Estimation for Image Stitching

Arxiv

4+阅读 · 2018年5月24日

Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering

Arxiv

5+阅读 · 2018年4月9日

微信扫码咨询专知VIP会员