Par Chain:利用近邻链条平行高层级群群群集框架 (ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain) - 专知论文

会员服务 ·

0

簇 · 优化器 · state-of-the-art · 可约的 · 层次聚类 ·

2022 年 2 月 14 日

ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain

翻译：Par Chain:利用近邻链条平行高层级群群群集框架

Shangdi Yu,Yiqiu Wang,Yan Gu,Laxman Dhulipala,Julian Shun

This paper studies the hierarchical clustering problem, where the goal is to produce a dendrogram that represents clusters at varying scales of a data set. We propose the ParChain framework for designing parallel hierarchical agglomerative clustering (HAC) algorithms, and using the framework we obtain novel parallel algorithms for the complete linkage, average linkage, and Ward's linkage criteria. Compared to most previous parallel HAC algorithms, which require quadratic memory, our new algorithms require only linear memory, and are scalable to large data sets. ParChain is based on our parallelization of the nearest-neighbor chain algorithm, and enables multiple clusters to be merged on every round. We introduce two key optimizations that are critical for efficiency: a range query optimization that reduces the number of distance computations required when finding nearest neighbors of clusters, and a caching optimization that stores a subset of previously computed distances, which are likely to be reused. Experimentally, we show that our highly-optimized implementations using 48 cores with two-way hyper-threading achieve 5.8--110.1x speedup over state-of-the-art parallel HAC algorithms and achieve 13.75--54.23x self-relative speedup. Compared to state-of-the-art algorithms, our algorithms require up to 237.3x less space. Our algorithms are able to scale to data set sizes with tens of millions of points, which existing algorithms are not able to handle.

翻译：本文研究等级分组问题, 其目标在于生成一个以不同尺度的数据集群集为代表群集的巢体。我们提议 Par Chain 框架, 用于设计平行的等级群集群集( HAC) 算法, 并使用这个框架, 我们获得新的平行算法, 用于完整连接、平均链接和沃德的连接标准。与大多数先前的平行 HAC 算法相比, 我们的新算法只需要线性内存, 并且可以向大型数据集缩放。 Par Chain 是基于我们近邻连锁算法的平行化, 并且可以使每个回合的多个群组合并。我们引入了两种关键优化, 这对于效率至关重要: 范围查询优化, 减少寻找最近的群集邻居、平均链接和沃德的联系标准。与大多数以前计算过的距离的一组算法相比, 这可能会被再利用。实验性地, 我们用48个高精度核心实施, 双向超速读算算法, 实现5. 110.1x 速度速度, 我们的自动算算算算算算算算为13. 54 的自动算算算算算算算算算算算为更慢的自动算算算法, 。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

地下水流数值模拟概念模型的不确定性分析

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

下一代移动通信上行链路迭代接收机形式化设计方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

基于灰色理论的SAR图像分割及其效果评价方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

Composite Anomaly Detection via Hierarchical Dynamic Search

Arxiv

0+阅读 · 2022年4月20日

Effects of Graph Convolutions in Deep Networks

Arxiv

0+阅读 · 2022年4月20日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

RIS-Assisted Cooperative NOMA with SWIPT

RIS-Assisted Cooperative NOMA with SWIPT

Arxiv

0+阅读 · 2022年4月18日

Split Learning over Wireless Networks: Parallel Design and Resource Management

Arxiv

0+阅读 · 2022年4月18日

Hierarchical Transformers Are More Efficient Language Models

Arxiv

3+阅读 · 2022年4月16日

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

《美陆军近战整合企业现代化计划（2025—2026）》最新报告

以色列-伊朗空战：短暂而激烈冲突的启示

《动态作战支援演习框架构建》80页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

Composite Anomaly Detection via Hierarchical Dynamic Search

Arxiv

0+阅读 · 2022年4月20日

Effects of Graph Convolutions in Deep Networks

Arxiv

0+阅读 · 2022年4月20日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

RIS-Assisted Cooperative NOMA with SWIPT

RIS-Assisted Cooperative NOMA with SWIPT

Arxiv

0+阅读 · 2022年4月18日

Split Learning over Wireless Networks: Parallel Design and Resource Management

Arxiv

0+阅读 · 2022年4月18日

Hierarchical Transformers Are More Efficient Language Models

Arxiv

3+阅读 · 2022年4月16日

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

地下水流数值模拟概念模型的不确定性分析

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

下一代移动通信上行链路迭代接收机形式化设计方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

基于灰色理论的SAR图像分割及其效果评价方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员