混合地物大数据集群 (Clustering of Big Data with Mixed Features) - 专知论文

会员服务 ·

0

簇 · CC · 可约的 · k-近邻 · 异常点 ·

2020 年 11 月 11 日

Clustering of Big Data with Mixed Features

翻译：混合地物大数据集群

Joshua Tobin,Mimi Zhang

from arxiv, 22 pages, 9 figures, for associated Python library, see https://pypi.org/project/CPFcluster/ , submitted to SDM 2021

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We here develop a new clustering algorithm for large data of mixed type, aiming at improving the applicability and efficiency of the peak-finding technique. The improvements are threefold: (1) the new algorithm is applicable to mixed data; (2) the algorithm is capable of detecting outliers and clusters of relatively lower density values; (3) the algorithm is competent at deciding the correct number of clusters. The computational complexity of the algorithm is greatly reduced by applying a fast k-nearest neighbors method and by scaling down to component sets. We present experimental results to verify that our algorithm works well in practice. Keywords: Clustering; Big Data; Mixed Attribute; Density Peaks; Nearest-Neighbor Graph; Conductance.

翻译：大量混合数据是数据开采的一个中心问题。许多方法采用了 k 手段的概念, 因而对初始化十分敏感, 只探测球类组, 并且先验地要求数量未知的组群。我们在这里为混合型的大数据开发一种新的群集算法, 目的是提高峰值调查技术的适用性和效率。改进有三重:(1) 新的算法适用于混合数据;(2) 算法能够探测密度值相对较低的离子和组群;(3) 算法有能力决定组群的正确数量。算法的计算复杂性通过采用快速的 k 最接近的邻里方法和缩小到组件组来大大降低。我们提出了实验结果, 以核实我们的算法在实践中效果良好。关键词: 组合; 大数据; 混合属性; Density Peaks; Neest- Neighbor 图表; 行为。

0

相关内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

6+阅读 · 2019年4月22日

On the importance of functions in data modeling

On the importance of functions in data modeling

Arxiv

0+阅读 · 2020年12月31日

On Perfect Classification and Clustering for Gaussian Processes

Arxiv

0+阅读 · 2020年12月30日

New Partitioning Techniques and Faster Algorithms for Approximate Interval Scheduling

Arxiv

0+阅读 · 2020年12月30日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

0+阅读 · 2020年12月29日

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Arxiv

0+阅读 · 2020年12月29日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Linkage Based Face Clustering via Graph Convolution Network

Arxiv

16+阅读 · 2019年3月27日

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Arxiv

88+阅读 · 2019年3月27日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

6+阅读 · 2019年4月22日

相关论文

On the importance of functions in data modeling

On the importance of functions in data modeling

Arxiv

0+阅读 · 2020年12月31日

On Perfect Classification and Clustering for Gaussian Processes

Arxiv

0+阅读 · 2020年12月30日

New Partitioning Techniques and Faster Algorithms for Approximate Interval Scheduling

Arxiv

0+阅读 · 2020年12月30日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

0+阅读 · 2020年12月29日

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Arxiv

0+阅读 · 2020年12月29日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Linkage Based Face Clustering via Graph Convolution Network

Arxiv

16+阅读 · 2019年3月27日

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Arxiv

88+阅读 · 2019年3月27日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

微信扫码咨询专知VIP会员