理论上有效、实际平行的DBSCAN (Theoretically-Efficient and Practical Parallel DBSCAN) - 专知论文

会员服务 ·

0

DBSCAN · 欧氏空间 · 近似 · Performer · CASE ·

2021 年 1 月 27 日

Theoretically-Efficient and Practical Parallel DBSCAN

翻译：理论上有效、实际平行的DBSCAN

Yiqiu Wang,Yan Gu,Julian Shun

The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take $O(n\log n)$ work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case, making them inefficient for large datasets. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with hyper-threading show that we outperform existing parallel DBSCAN implementations by up to several orders of magnitude, and achieve speedups by up to 33x over the best sequential algorithms.

翻译：DBSCAN 空间集群方法因其适用于各种数据分析任务而受到极大关注。在Euclidean 的Euclidean 空间,DBSCAN有快速的顺序算法,需要花费O(n\log n)美元,用于两个维度,即三个或三个以上维度的次赤道工作,并可在线性工作中对任何不变的维度进行大致计算。然而,现有的DBSCAN 平行算法要求在最坏的情况下进行四级工作,使其在大型数据集和参数设置方面效率低下。本文弥合了平行DBSCAN的理论和实践之间的差距,为Euclidean 精确的 DBSCAN和大致的DBSCAN提出了新的平行算法,这些算法与其相继对应的对应方的工作界限相匹配,而且高度平行(pologlicrical 深度)。我们介绍了我们的算法的执行情况,同时优化了它们的实际性。我们对各种数据集和参数设置进行了全面实验性评估。我们用超高读的36核心机器的实验显示我们超越了现有的DBSCAN现有平行的平行执行系统,达到33级,达到最高级,并达到33级,实现速度。

0

相关内容

DBSCAN

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

自动化学科面临的挑战

自动化学科面临的挑战

专知会员服务

38+阅读 · 2020年12月19日

【干货书】现代 C++ 教程：高速上手 C++11/14/17/20，82页pdf

专知会员服务

82+阅读 · 2020年9月28日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

5+阅读 · 2019年3月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【大数据】StreamSets：一个大数据采集工具

【大数据】StreamSets：一个大数据采集工具

产业智能官

40+阅读 · 2018年12月5日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

A parallel implementation of a diagonalization-based parallel-in-time integrator

A parallel implementation of a diagonalization-based parallel-in-time integrator

Arxiv

0+阅读 · 2021年3月23日

Groovy Parallel Patterns: A Process oriented Parallelization Library

Groovy Parallel Patterns: A Process oriented Parallelization Library

Arxiv

0+阅读 · 2021年3月22日

Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence

Arxiv

0+阅读 · 2021年3月22日

Efficient Processing of k-regret Minimization Queries with Theoretical Guarantees

Arxiv

0+阅读 · 2021年3月22日

Markov Modeling of Time-Series Data using Symbolic Analysis

Arxiv

0+阅读 · 2021年3月20日

A Parallel Batch-Dynamic Data Structure for the Closest Pair Problem

Arxiv

0+阅读 · 2021年3月18日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

Linear SLAM: Linearising the SLAM Problems using Submap Joining

Linear SLAM: Linearising the SLAM Problems using Submap Joining

Arxiv

3+阅读 · 2018年9月18日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

自动化学科面临的挑战

自动化学科面临的挑战

专知会员服务

38+阅读 · 2020年12月19日

【干货书】现代 C++ 教程：高速上手 C++11/14/17/20，82页pdf

专知会员服务

82+阅读 · 2020年9月28日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

5+阅读 · 2019年3月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【大数据】StreamSets：一个大数据采集工具

【大数据】StreamSets：一个大数据采集工具

产业智能官

40+阅读 · 2018年12月5日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

A parallel implementation of a diagonalization-based parallel-in-time integrator

A parallel implementation of a diagonalization-based parallel-in-time integrator

Arxiv

0+阅读 · 2021年3月23日

Groovy Parallel Patterns: A Process oriented Parallelization Library

Groovy Parallel Patterns: A Process oriented Parallelization Library

Arxiv

0+阅读 · 2021年3月22日

Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence

Arxiv

0+阅读 · 2021年3月22日

Efficient Processing of k-regret Minimization Queries with Theoretical Guarantees

Arxiv

0+阅读 · 2021年3月22日

Markov Modeling of Time-Series Data using Symbolic Analysis

Arxiv

0+阅读 · 2021年3月20日

A Parallel Batch-Dynamic Data Structure for the Closest Pair Problem

Arxiv

0+阅读 · 2021年3月18日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

Linear SLAM: Linearising the SLAM Problems using Submap Joining

Linear SLAM: Linearising the SLAM Problems using Submap Joining

Arxiv

3+阅读 · 2018年9月18日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员