噪音- 气压集束 (Noise-robust Clustering) - 专知论文

会员服务 ·

0

簇 · Extensibility · 噪声分布 · 相似度 · 噪声 ·

2021 年 10 月 17 日

Noise-robust Clustering

翻译：噪音- 气压集束

Rahmat Adesunkanmi,Ratnesh Kumar

This paper presents noise-robust clustering techniques in unsupervised machine learning. The uncertainty about the noise, consistency, and other ambiguities can become severe obstacles in data analytics. As a result, data quality, cleansing, management, and governance remain critical disciplines when working with Big Data. With this complexity, it is no longer sufficient to treat data deterministically as in a classical setting, and it becomes meaningful to account for noise distribution and its impact on data sample values. Classical clustering methods group data into "similarity classes" depending on their relative distances or similarities in the underlying space. This paper addressed this problem via the extension of classical $K$-means and $K$-medoids clustering over data distributions (rather than the raw data). This involves measuring distances among distributions using two types of measures: the optimal mass transport (also called Wasserstein distance, denoted $W_2$) and a novel distance measure proposed in this paper, the expected value of random variable distance (denoted ED). The presented distribution-based $K$-means and $K$-medoids algorithms cluster the data distributions first and then assign each raw data to the cluster of data's distribution.

翻译：本文介绍了在不受监督的机器学习过程中的噪音- 有机热聚变技术。噪音、一致性和其他模糊性方面的不确定性可能成为数据分析中的严重障碍。因此, 数据质量、清理、管理和治理仍然是与大数据合作的关键学科。如此复杂, 不再足以像古典环境那样对数据进行决定性的处理, 也不足以考虑噪音分布及其对数据样本值的影响。经典集束方法将数据分组数据分为“ 相近类 ”, 取决于其相对距离或基础空间的相似性。本文通过经典 $ 平均值和 $ 美元类集成数据分布( 而不是原始数据 ) 来解决这个问题。这涉及使用两种措施衡量分布之间的距离: 最佳大众运输( 也称为瓦塞斯坦距离, 注意 $_ 2 ), 以及本文中提议的新的距离测量标准, 随机可变距离的预期值( 注意 ED ) 。本文通过基于发行的 $ K 比例和 $ $ 美元基基类组数数据分配数据向原始数据分组分配。

0

相关内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【ECML-PKDD 2019】可扩展的深度无监督集群与具体的GMVAEs（Scalable Deep Unsupervised Clustering with Concrete GMVAEs）

【ECML-PKDD 2019】可扩展的深度无监督集群与具体的GMVAEs（Scalable Deep Unsupervised Clustering with Concrete GMVAEs）

专知会员服务

7+阅读 · 2019年12月3日

《人工智能与图数据库技术白皮书》（2019版），12页PDF，Amy E.Hodler、Mark Needham & Jake Graham（俞方桦博士编译）

《人工智能与图数据库技术白皮书》（2019版），12页PDF，Amy E.Hodler、Mark Needham & Jake Graham（俞方桦博士编译）

专知会员服务

35+阅读 · 2019年11月9日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

已删除

AI科技评论

4+阅读 · 2018年8月12日

How to Find a Good Explanation for Clustering?

Arxiv

0+阅读 · 2021年12月16日

High-dimensional logistic entropy clustering

Arxiv

0+阅读 · 2021年12月16日

Online Missing Value Imputation and Change Point Detection with the Gaussian Copula

Arxiv

0+阅读 · 2021年12月15日

The high-dimensional asymptotics of first order methods with random data

Arxiv

0+阅读 · 2021年12月14日

Open Intent Discovery through Unsupervised Semantic Clustering and Dependency Parsing

Arxiv

0+阅读 · 2021年12月14日

Multi-view Contrastive Graph Clustering

Arxiv

13+阅读 · 2021年10月22日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

VIP会员

文章信息

相关主题

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【ECML-PKDD 2019】可扩展的深度无监督集群与具体的GMVAEs（Scalable Deep Unsupervised Clustering with Concrete GMVAEs）

【ECML-PKDD 2019】可扩展的深度无监督集群与具体的GMVAEs（Scalable Deep Unsupervised Clustering with Concrete GMVAEs）

专知会员服务

7+阅读 · 2019年12月3日

《人工智能与图数据库技术白皮书》（2019版），12页PDF，Amy E.Hodler、Mark Needham & Jake Graham（俞方桦博士编译）

《人工智能与图数据库技术白皮书》（2019版），12页PDF，Amy E.Hodler、Mark Needham & Jake Graham（俞方桦博士编译）

专知会员服务

35+阅读 · 2019年11月9日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

已删除

AI科技评论

4+阅读 · 2018年8月12日

相关论文

How to Find a Good Explanation for Clustering?

Arxiv

0+阅读 · 2021年12月16日

High-dimensional logistic entropy clustering

Arxiv

0+阅读 · 2021年12月16日

Online Missing Value Imputation and Change Point Detection with the Gaussian Copula

Arxiv

0+阅读 · 2021年12月15日

The high-dimensional asymptotics of first order methods with random data

Arxiv

0+阅读 · 2021年12月14日

Open Intent Discovery through Unsupervised Semantic Clustering and Dependency Parsing

Arxiv

0+阅读 · 2021年12月14日

Multi-view Contrastive Graph Clustering

Arxiv

13+阅读 · 2021年10月22日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

微信扫码咨询专知VIP会员