高多位散射数据反向翻转的 k- 方法群集 (Structured Inverted-File k-Means Clustering for High-Dimensional Sparse Data) - 专知论文

会员服务 ·

0

簇 · Performer · 相似度 · 稀疏 · 可约的 ·

2021 年 3 月 30 日

Structured Inverted-File k-Means Clustering for High-Dimensional Sparse Data

翻译：高多位散射数据反向翻转的 k- 方法群集

Kazuo Aoyama,Kazumi Saito

from arxiv, 10 pages, 12 figures

This paper presents an architecture-friendly k-means clustering algorithm called SIVF for a large-scale and high-dimensional sparse data set. Algorithm efficiency on time is often measured by the number of costly operations such as similarity calculations. In practice, however, it depends greatly on how the algorithm adapts to an architecture of the computer system which it is executed on. Our proposed SIVF employs invariant centroid-pair based filter (ICP) to decrease the number of similarity calculations between a data object and centroids of all the clusters. To maximize the ICP performance, SIVF exploits for a centroid set an inverted-file that is structured so as to reduce pipeline hazards. We demonstrate in our experiments on real large-scale document data sets that SIVF operates at higher speed and with lower memory consumption than existing algorithms. Our performance analysis reveals that SIVF achieves the higher speed by suppressing performance degradation factors of the number of cache misses and branch mispredictions rather than less similarity calculations.

翻译：本文展示了一种结构友好的K- means群集算法,称为SIVF,用于大规模和高维分散数据集。对数值的及时效率通常以类似计算等费用高昂的操作数量来衡量。然而,在实践中,它在很大程度上取决于算法如何适应计算机系统的结构。我们提议的SIVF使用基于无变量的中子机器人过滤器(ICP)来减少数据对象与所有分类组的类固醇之间的相似性计算数量。为了最大限度地提高比较方案性能,SIVF利用一个为减少管道危险而设置的反向文件。我们在实际大规模文件数据集的实验中表明,SIVF的运行速度高于现有算法,记忆消耗也低于现有的算法。我们的业绩分析表明,SIVF通过抑制缓存误差和分支误差数量的性能降解因素,而不是较少相似性计算,从而达到更高的速度。

0

相关内容

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

专知会员服务

55+阅读 · 2020年12月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【ECML-PKDD 2019】多维时间序列和事件日志的模式挖掘和异常检测框架（A framework for pattern mining and anomalydetection in multi-dimensional time series andevent logs）

【ECML-PKDD 2019】多维时间序列和事件日志的模式挖掘和异常检测框架（A framework for pattern mining and anomalydetection in multi-dimensional time series andevent logs）

专知会员服务

38+阅读 · 2019年12月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【Strata Data Conference】用于自然语言处理的深度学习方法

【Strata Data Conference】用于自然语言处理的深度学习方法

专知会员服务

49+阅读 · 2019年9月23日

【Github】TextCluster：短文本聚类预处理模块 Short text cluster

【Github】TextCluster：短文本聚类预处理模块 Short text cluster

AINLP

5+阅读 · 2019年12月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

干货 | 自然语言处理(2)之浅谈向量化与Hash-Trick

干货 | 自然语言处理(2)之浅谈向量化与Hash-Trick

机器学习算法与Python学习

3+阅读 · 2017年12月13日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Investigating Manifold Neighborhood size for Nonlinear Analysis of LIBS Amino Acid Spectra

Arxiv

0+阅读 · 2021年5月25日

Statistical power for cluster analysis

Arxiv

0+阅读 · 2021年5月25日

Two-directional simultaneous inference for high-dimensional models

Arxiv

0+阅读 · 2021年5月21日

Rotation invariant CNN using scattering transform for image classification

Arxiv

0+阅读 · 2021年5月21日

Learning multivariate functions with low-dimensional structures using polynomial bases

Arxiv

0+阅读 · 2021年5月20日

Fast Nonblocking Persistence for Concurrent Data Structures

Arxiv

0+阅读 · 2021年5月20日

Robust partial Fourier reconstruction for diffusion-weighted imaging using a recurrent convolutional neural network

Arxiv

0+阅读 · 2021年5月19日

Improved Product-Based High-Dimensional Expanders

Arxiv

0+阅读 · 2021年5月19日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Multi-Source Neural Machine Translation with Missing Data

Arxiv

5+阅读 · 2018年6月7日

VIP会员

文章信息

相关主题

相关VIP内容

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

专知会员服务

55+阅读 · 2020年12月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【ECML-PKDD 2019】多维时间序列和事件日志的模式挖掘和异常检测框架（A framework for pattern mining and anomalydetection in multi-dimensional time series andevent logs）

【ECML-PKDD 2019】多维时间序列和事件日志的模式挖掘和异常检测框架（A framework for pattern mining and anomalydetection in multi-dimensional time series andevent logs）

专知会员服务

38+阅读 · 2019年12月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【Strata Data Conference】用于自然语言处理的深度学习方法

【Strata Data Conference】用于自然语言处理的深度学习方法

专知会员服务

49+阅读 · 2019年9月23日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

【Github】TextCluster：短文本聚类预处理模块 Short text cluster

【Github】TextCluster：短文本聚类预处理模块 Short text cluster

AINLP

5+阅读 · 2019年12月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

干货 | 自然语言处理(2)之浅谈向量化与Hash-Trick

干货 | 自然语言处理(2)之浅谈向量化与Hash-Trick

机器学习算法与Python学习

3+阅读 · 2017年12月13日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Investigating Manifold Neighborhood size for Nonlinear Analysis of LIBS Amino Acid Spectra

Arxiv

0+阅读 · 2021年5月25日

Statistical power for cluster analysis

Arxiv

0+阅读 · 2021年5月25日

Two-directional simultaneous inference for high-dimensional models

Arxiv

0+阅读 · 2021年5月21日

Rotation invariant CNN using scattering transform for image classification

Arxiv

0+阅读 · 2021年5月21日

Learning multivariate functions with low-dimensional structures using polynomial bases

Arxiv

0+阅读 · 2021年5月20日

Fast Nonblocking Persistence for Concurrent Data Structures

Arxiv

0+阅读 · 2021年5月20日

Robust partial Fourier reconstruction for diffusion-weighted imaging using a recurrent convolutional neural network

Arxiv

0+阅读 · 2021年5月19日

Improved Product-Based High-Dimensional Expanders

Arxiv

0+阅读 · 2021年5月19日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Multi-Source Neural Machine Translation with Missing Data

Arxiv

5+阅读 · 2018年6月7日

微信扫码咨询专知VIP会员