SDCOR:大规模规模数据集中基于可缩放密度的局部异差探测群集 (SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets) - 专知论文

会员服务 ·

0

异常点 · 簇 · 数据集 · Performer · SCAN ·

2021 年 4 月 12 日

SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets

翻译：SDCOR:大规模规模数据集中基于可缩放密度的局部异差探测群集

Sayyed-Ahmad Naghavi-Nozad,Maryam Amir Haeri,Gianluigi Folino

from arxiv, The PSO cost function is modified. Some more references on the internal clustering validity indices are added. Some minor grammatical errors are rectified

This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Unlike the well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the input data chunk-by-chunk within the confines of a limited memory buffer. A temporary clustering model is built at the first phase; then, it is gradually updated by analyzing consecutive memory-loads of points. Subsequently, at the end of scalable clustering, the approximate structure of the original clusters is obtained. Finally, by another scan of the entire dataset and using a suitable criterion, an outlying score is assigned to each object called SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity and is more effective and efficient compared to best-known conventional density-based methods, which need to load all data into the memory; and also, to some fast distance-based methods, which can perform on data resident in the disk.

翻译：本文介绍了在大规模数据集中进行局部异常探测的分批密度基群集方法。与所有数据都是内存居民的著名传统算法不同,我们提议的方法是可缩放的,在有限的内存缓冲范围内处理输入数据块逐整体。在第一阶段建立临时集束模型; 然后,通过分析连续的内存载量逐步更新。随后,在可缩放集的末尾,获得原始组群的近似结构。最后,通过对整个数据集进行另一次扫描并使用适当的标准,为每个称为SDCOR(基于可缩放密度的集群外差率比率)的物体指定了外围分数。对实际寿命和合成数据集的评价表明,拟议方法的线性时间复杂性较低,与最著名的传统密度方法相比,其效率和效益更高,后者需要将所有数据装入内存; 以及,通过一些快速的远程方法,可以在磁盘中的数据上进行。

0

相关内容

异常点

区块链白皮书（2020年），60页pdf

区块链白皮书（2020年），60页pdf

专知会员服务

92+阅读 · 2021年1月5日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

专知会员服务

25+阅读 · 2020年3月17日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Fast R-CNN

数据挖掘入门与实战

3+阅读 · 2018年4月20日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Efficient Online-Bandit Strategies for Minimax Learning Problems

Arxiv

0+阅读 · 2021年6月4日

Distributed nonparametric regression imputation for missing response problems with large-scale data

Arxiv

0+阅读 · 2021年6月4日

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Arxiv

6+阅读 · 2021年4月1日

Clustered Object Detection in Aerial Images

Clustered Object Detection in Aerial Images

Arxiv

5+阅读 · 2019年8月27日

Approximation Ratios of Graph Neural Networks for Combinatorial Problems

Arxiv

7+阅读 · 2019年5月24日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

Scale-Aware Trident Networks for Object Detection

Scale-Aware Trident Networks for Object Detection

Arxiv

4+阅读 · 2019年1月7日

A Framework of Transfer Learning in Object Detection for Embedded Systems

Arxiv

3+阅读 · 2018年11月12日

One-Class Adversarial Nets for Fraud Detection

Arxiv

3+阅读 · 2018年6月5日

VIP会员

文章信息

相关主题

相关VIP内容

区块链白皮书（2020年），60页pdf

区块链白皮书（2020年），60页pdf

专知会员服务

92+阅读 · 2021年1月5日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

专知会员服务

25+阅读 · 2020年3月17日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

网络科学赋能人工智能: 现状与展望

【NeurIPS2025教程】解释人工智能模型：可解释人工智能、数据中心人工智能与机制可解释性的方法与机遇

人工智能赋能作战行动：以俄乌战争为例

【ETHZ博士论文】表征学习在推进深度学习中的作用：效率、可扩展性与推理

相关资讯

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Fast R-CNN

数据挖掘入门与实战

3+阅读 · 2018年4月20日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Efficient Online-Bandit Strategies for Minimax Learning Problems

Arxiv

0+阅读 · 2021年6月4日

Distributed nonparametric regression imputation for missing response problems with large-scale data

Arxiv

0+阅读 · 2021年6月4日

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Arxiv

6+阅读 · 2021年4月1日

Clustered Object Detection in Aerial Images

Clustered Object Detection in Aerial Images

Arxiv

5+阅读 · 2019年8月27日

Approximation Ratios of Graph Neural Networks for Combinatorial Problems

Arxiv

7+阅读 · 2019年5月24日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

8+阅读 · 2019年5月20日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

Scale-Aware Trident Networks for Object Detection

Scale-Aware Trident Networks for Object Detection

Arxiv

4+阅读 · 2019年1月7日

A Framework of Transfer Learning in Object Detection for Embedded Systems

Arxiv

3+阅读 · 2018年11月12日

One-Class Adversarial Nets for Fraud Detection

Arxiv

3+阅读 · 2018年6月5日

微信扫码咨询专知VIP会员