用于文档集集的节用简简简球球式 k- Means 工具 (Efficient Sparse Spherical k-Means for Document Clustering) - 专知论文

会员服务 ·

0

簇 · 可约的 · 稀疏 · Performer · 特化 ·

2021 年 7 月 30 日

Efficient Sparse Spherical k-Means for Document Clustering

翻译：用于文档集集的节用简简简球球式 k- Means 工具

Johannes Knittel,Steffen Koch,Thomas Ertl

from arxiv, ACM DocEng 2021

Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient. However, the time complexity increases linearly with the number of clusters k, which limits the suitability of the algorithm for larger values of k depending on the size of the collection. Optimizations targeted at the Euclidean k-Means algorithm largely do not apply because the cosine distance is not a metric. We therefore propose an efficient indexing structure to improve the scalability of Spherical k-Means with respect to k. Our approach exploits the sparsity of the input vectors and the convergence behavior of k-Means to reduce the number of comparisons on each iteration significantly.

翻译：球形 k- Means 通常用于集束文件收藏, 因为它在许多设置中表现得相当好,而且具有计算效率。但是,随着集束 k 的数量, 时间复杂性会直线增加, 从而限制了算法对较大 k 值的适合性, 取决于收集的大小。以 Euclidean k- Means 算法为对象的优化应用在很大程度上并不适用, 因为焦线距离不是一个尺度。因此, 我们提出一个高效的索引结构, 以提高球形 k- Means 相对于 k 的可缩缩缩性。我们的方法利用了输入矢量的宽度和 k- Means 的趋同行为, 以显著减少每次迭代的比较次数。

0

相关内容

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

专知会员服务

124+阅读 · 2020年12月7日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

196+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

8+阅读 · 2017年7月21日

Row-clustering of a Point Process-valued Matrix

Row-clustering of a Point Process-valued Matrix

Arxiv

0+阅读 · 2021年10月4日

Efficient Importance Sampling for Large Sums of Independent and Identically Distributed Random Variables

Arxiv

0+阅读 · 2021年10月1日

Scalable Hierarchical Agglomerative Clustering

Arxiv

0+阅读 · 2021年9月30日

Deep Embedded K-Means Clustering

Arxiv

1+阅读 · 2021年9月30日

Kernel distance measures for time series, random fields and other structured data

Arxiv

0+阅读 · 2021年9月29日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Arxiv

4+阅读 · 2018年10月11日

Kernelized Hashcode Representations for Biomedical Relation Extraction

Kernelized Hashcode Representations for Biomedical Relation Extraction

Arxiv

4+阅读 · 2018年8月17日

VIP会员

文章信息

相关主题

相关VIP内容

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

专知会员服务

124+阅读 · 2020年12月7日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

196+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

8+阅读 · 2017年7月21日

相关论文

Row-clustering of a Point Process-valued Matrix

Row-clustering of a Point Process-valued Matrix

Arxiv

0+阅读 · 2021年10月4日

Efficient Importance Sampling for Large Sums of Independent and Identically Distributed Random Variables

Arxiv

0+阅读 · 2021年10月1日

Scalable Hierarchical Agglomerative Clustering

Arxiv

0+阅读 · 2021年9月30日

Deep Embedded K-Means Clustering

Arxiv

1+阅读 · 2021年9月30日

Kernel distance measures for time series, random fields and other structured data

Arxiv

0+阅读 · 2021年9月29日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Arxiv

4+阅读 · 2018年10月11日

Kernelized Hashcode Representations for Biomedical Relation Extraction

Kernelized Hashcode Representations for Biomedical Relation Extraction

Arxiv

4+阅读 · 2018年8月17日

微信扫码咨询专知VIP会员