大比例的建筑图图: 联盟查找碎块 (Building Graphs at a Large Scale: Union Find Shuffle) - 专知论文

会员服务 ·

0

缩放 · 图 · Performer · Processing（编程语言） · Hadoop ·

2021 年 1 月 25 日

Building Graphs at a Large Scale: Union Find Shuffle

翻译：大比例的建筑图图: 联盟查找碎块

Saigopal Thota,Mridul Jain,Nishad Kamat,Saikiran Malikireddy,Pruthvi Raj Eranti,Albin Kuruvilla

Large scale graph processing using distributed computing frameworks is becoming pervasive and efficient in the industry. In this work, we present a highly scalable and configurable distributed algorithm for building connected components, called Union Find Shuffle (UFS) with Path Compression. The scale and complexity of the algorithm are a function of the number of partitions into which the data is initially partitioned, and the size of the connected components. We discuss the complexity and the benchmarks compared to similar approaches. We also present current benchmarks of our production system, running on commodity out-of-the-box cloud Hadoop infrastructure, where the algorithm was deployed over a year ago, scaled to around 75 Billion nodes and 60 Billions linkages (and growing). We highlight the key aspects of our algorithm which enable seamless scaling and performance even in the presence of skewed data with large connected components in the size of 10 Billion nodes each.

翻译：使用分布式计算框架的大型图表处理正在行业中变得普遍和高效。在这项工作中,我们展示了一个高度可缩放和可配置的构建连接组件的分布算法,称为Union Find Shuffle(UFS)和路径压缩。算法的规模和复杂程度取决于数据最初被分割的分区数量以及连接组件的大小。我们讨论了与类似方法相比的复杂性和基准。我们还介绍了我们生产系统的现有基准,它运行于一年多前已部署的商品箱外云中哈多普基础设施上,该算法被扩大至大约75亿个节点和60亿个连接(并正在增长 ) 。我们强调了我们的算法的关键方面,即使存在10亿个节点大小的大型连接组件的扭曲数据,也能够实现无缝的缩放和性。

0

相关内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

专知会员服务

15+阅读 · 2019年12月24日

【神经网络数学的初学者指南】（A Beginner’s Guide to the Mathematics of Neural Networks），伦敦国王学院数学系教授| A. C. C. Coolen

【神经网络数学的初学者指南】（A Beginner’s Guide to the Mathematics of Neural Networks），伦敦国王学院数学系教授| A. C. C. Coolen

专知会员服务

55+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

已删除

将门创投

4+阅读 · 2018年7月31日

OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs

Arxiv

0+阅读 · 2021年3月17日

Multi-Level Local SGD for Heterogeneous Hierarchical Networks

Arxiv

0+阅读 · 2021年3月16日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

Signed Graph Attention Networks

Signed Graph Attention Networks

Arxiv

7+阅读 · 2019年9月5日

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

Arxiv

9+阅读 · 2019年8月28日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Outlier Aware Network Embedding for Attributed Networks

Arxiv

6+阅读 · 2018年11月19日

Knowledge Graph Error detection and Completion

Arxiv

4+阅读 · 2018年11月6日

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

Arxiv

14+阅读 · 2018年6月6日

Billion-scale Network Embedding with Iterative Random Projection

Arxiv

5+阅读 · 2018年5月7日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

专知会员服务

15+阅读 · 2019年12月24日

【神经网络数学的初学者指南】（A Beginner’s Guide to the Mathematics of Neural Networks），伦敦国王学院数学系教授| A. C. C. Coolen

【神经网络数学的初学者指南】（A Beginner’s Guide to the Mathematics of Neural Networks），伦敦国王学院数学系教授| A. C. C. Coolen

专知会员服务

55+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

已删除

将门创投

4+阅读 · 2018年7月31日

相关论文

OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs

Arxiv

0+阅读 · 2021年3月17日

Multi-Level Local SGD for Heterogeneous Hierarchical Networks

Arxiv

0+阅读 · 2021年3月16日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

Signed Graph Attention Networks

Signed Graph Attention Networks

Arxiv

7+阅读 · 2019年9月5日

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

Arxiv

9+阅读 · 2019年8月28日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Outlier Aware Network Embedding for Attributed Networks

Arxiv

6+阅读 · 2018年11月19日

Knowledge Graph Error detection and Completion

Arxiv

4+阅读 · 2018年11月6日

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

Arxiv

14+阅读 · 2018年6月6日

Billion-scale Network Embedding with Iterative Random Projection

Arxiv

5+阅读 · 2018年5月7日

微信扫码咨询专知VIP会员