通过平行关联集群进行可缩放社区探测 (Scalable Community Detection via Parallel Correlation Clustering) - 专知论文

会员服务 ·

0

簇 · 社区发现 · 相关系数 · state-of-the-art · Weight ·

2021 年 7 月 27 日

Scalable Community Detection via Parallel Correlation Clustering

翻译：通过平行关联集群进行可缩放社区探测

Jessica Shi,Laxman Dhulipala,David Eisenstat,Jakub Łącki,Vahab Mirrokni

from arxiv, This is a preliminary version of a paper that will appear at VLDB'21

Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LambdaCC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. For example, on a 30-core machine with two-way hyper-threading, our implementations achieve orders of magnitude speedups over other correlation clustering baselines, and up to 28.44x speedups over our own sequential baselines while maintaining or improving quality.

翻译：图表群集和社区探测是现代数据挖掘的核心问题。分析10亿尺度数据的需求日益增长,要求对这些问题采用更快、更可扩展的算法。这种群集算法在质量和速度之间有一定的权衡取舍。在本文中,我们设计了可扩展的算法,在根据地面真相进行评估时能够达到高质量。我们根据LambdaCC 目标(由Veldt等人介绍)开发了一个普遍、顺序和共享的平行框架,它包括模块化和相关组合。我们的框架包括高度优化的实施,在未加权和加权的图表上,将数十亿边缘的大型数据集与地面真相数据相比较,并获得高质量的组群。我们的经验评估表明,这个框架改进了可扩展社区探测速度和质量之间的最新平衡取舍。例如,在具有双向超导读功能的30核心机器上,我们的实施实现比其他相关组合基线的量级加速,以及超过我们连续基线的28.44x速度,同时保持或改进质量。

0

相关内容

【干货书】图、网络与算法，655页pdf，Graphs, Networks，and Algorithms

【干货书】图、网络与算法，655页pdf，Graphs, Networks，and Algorithms

专知会员服务

95+阅读 · 2021年9月21日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【三星AI-CVPR2020】增量小样本目标检测，Incremental Few-Shot Object Detection

专知会员服务

69+阅读 · 2020年3月11日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

196+阅读 · 2020年2月1日

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

专知会员服务

42+阅读 · 2019年11月21日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

车辆目标检测

车辆目标检测

数据挖掘入门与实战

30+阅读 · 2018年3月30日

整合全部顶尖目标检测算法：FAIR开源Detectron

整合全部顶尖目标检测算法：FAIR开源Detectron

炼数成金订阅号

6+阅读 · 2018年1月25日

资源 | 整合全部顶尖目标检测算法：FAIR开源Detectron

资源 | 整合全部顶尖目标检测算法：FAIR开源Detectron

机器之心

3+阅读 · 2018年1月23日

论文浅尝 | Improved Neural Relation Detection for KBQA

论文浅尝 | Improved Neural Relation Detection for KBQA

开放知识图谱

13+阅读 · 2018年1月21日

Computing Graph Descriptors on Edge Streams

Arxiv

0+阅读 · 2021年10月4日

Warping Resilient Scalable Anomaly Detection in Time Series

Arxiv

0+阅读 · 2021年10月2日

Private Hierarchical Clustering and Efficient Approximation

Arxiv

0+阅读 · 2021年10月1日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Arxiv

17+阅读 · 2020年3月31日

Dash: Scalable Hashing on Persistent Memory

Arxiv

6+阅读 · 2020年3月16日

EfficientDet: Scalable and Efficient Object Detection

EfficientDet: Scalable and Efficient Object Detection

Arxiv

6+阅读 · 2019年11月20日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

dynnode2vec: Scalable Dynamic Network Embedding

dynnode2vec: Scalable Dynamic Network Embedding

Arxiv

14+阅读 · 2018年12月6日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【干货书】图、网络与算法，655页pdf，Graphs, Networks，and Algorithms

【干货书】图、网络与算法，655页pdf，Graphs, Networks，and Algorithms

专知会员服务

95+阅读 · 2021年9月21日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

【三星AI-CVPR2020】增量小样本目标检测，Incremental Few-Shot Object Detection

专知会员服务

69+阅读 · 2020年3月11日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

196+阅读 · 2020年2月1日

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

专知会员服务

42+阅读 · 2019年11月21日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

训练扩散模型其实比你想象的更简单！何恺明团队新作Dispersive Loss：给扩散模型加正则化

【ICML2025】用于可扩展持续强化学习的自组合策略

最新4500字《死亡算法：人工智能如何助推对加沙的大规模杀戮》（附原文）

人工智能行业：2027年AI预测报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

车辆目标检测

车辆目标检测

数据挖掘入门与实战

30+阅读 · 2018年3月30日

整合全部顶尖目标检测算法：FAIR开源Detectron

整合全部顶尖目标检测算法：FAIR开源Detectron

炼数成金订阅号

6+阅读 · 2018年1月25日

资源 | 整合全部顶尖目标检测算法：FAIR开源Detectron

资源 | 整合全部顶尖目标检测算法：FAIR开源Detectron

机器之心

3+阅读 · 2018年1月23日

论文浅尝 | Improved Neural Relation Detection for KBQA

论文浅尝 | Improved Neural Relation Detection for KBQA

开放知识图谱

13+阅读 · 2018年1月21日

相关论文

Computing Graph Descriptors on Edge Streams

Arxiv

0+阅读 · 2021年10月4日

Warping Resilient Scalable Anomaly Detection in Time Series

Arxiv

0+阅读 · 2021年10月2日

Private Hierarchical Clustering and Efficient Approximation

Arxiv

0+阅读 · 2021年10月1日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Arxiv

17+阅读 · 2020年3月31日

Dash: Scalable Hashing on Persistent Memory

Arxiv

6+阅读 · 2020年3月16日

EfficientDet: Scalable and Efficient Object Detection

EfficientDet: Scalable and Efficient Object Detection

Arxiv

6+阅读 · 2019年11月20日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

dynnode2vec: Scalable Dynamic Network Embedding

dynnode2vec: Scalable Dynamic Network Embedding

Arxiv

14+阅读 · 2018年12月6日

微信扫码咨询专知VIP会员