在线无替代工具 k 手段群集的意外效果 (Unexpected Effects of Online no-Substitution k-means Clustering) - 专知论文

会员服务 ·

0

簇 · 近似 · 可约的 · CASES · 在线 ·

2021 年 2 月 22 日

Unexpected Effects of Online no-Substitution k-means Clustering

翻译：在线无替代工具 k 手段群集的意外效果

Michal Moshkovitz

Offline k-means clustering was studied extensively, and algorithms with a constant approximation are available. However, online clustering is still uncharted. New factors come into play: the ordering of the dataset and whether the number of points, n, is known in advance or not. Their exact effects are unknown. In this paper we focus on the online setting where the decisions are irreversible: after a point arrives, the algorithm needs to decide whether to take the point as a center or not, and this decision is final. How many centers are needed and sufficient to achieve constant approximation in this setting? We show upper and lower bounds for all the different cases. These bounds are exactly the same up to a constant, thus achieving optimal bounds. For example, for k-means cost with constant k>1 and random order, Theta(log n) centers are enough to achieve a constant approximation, while the mere a priori knowledge of n reduces the number of centers to a constant. These bounds hold for any distance function that obeys a triangle-type inequality.

翻译：离线 k 角度组群已经进行了广泛的研究, 并且可以使用恒定近似值的算法。但是, 在线组群仍然没有被探索。新的因素正在起作用: 数据集的顺序以及点数( n) 是否事先已知。它们的确切效果未知。在本文中, 我们关注决定不可逆转的在线设置 : 在点到达后, 算法需要决定是否将点作为中心点, 而此决定是最终的。在这个设置中, 有多少中心需要且足以实现恒定近似? 我们为所有不同的情况显示上下界限。这些界限完全相同到一个常数, 从而达到最佳的界限。例如, 对于以恒定 k>1 和随机顺序计算的 k 平均成本, Theta (log n) 中心足以实现恒定近, 而仅仅对 n 的先期知识将中心数减少到一个常数。这些边框将维持在符合三角形不平等性的任何远程函数。

0

相关内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

新书《图数据科学傻瓜式入门》，53页pdf

专知会员服务

116+阅读 · 2020年11月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

196+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

4+阅读 · 2018年11月6日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Fair Allocation of Conflicting Items

Arxiv

0+阅读 · 2021年4月16日

Achieving differential privacy for $k$-nearest neighbors based outlier detection by data partitioning

Arxiv

0+阅读 · 2021年4月16日

Clan Embeddings into Trees, and Low Treewidth Graphs

Arxiv

0+阅读 · 2021年4月15日

Proof of the satisfiability conjecture for large k

Arxiv

0+阅读 · 2021年4月15日

Peeking Behind the Ordinal Curtain: Improving Distortion via Cardinal Queries

Arxiv

0+阅读 · 2021年4月15日

Stochastic Processes with Expected Stopping Time

Arxiv

0+阅读 · 2021年4月15日

Query-Competitive Sorting with Uncertainty

Arxiv

0+阅读 · 2021年4月14日

Sharp phase transitions for exact support recovery under local differential privacy

Arxiv

0+阅读 · 2021年4月14日

Semantic Neighborhood Ordering in Multi-objective Genetic Programming based on Decomposition

Arxiv

0+阅读 · 2021年4月13日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

新书《图数据科学傻瓜式入门》，53页pdf

专知会员服务

116+阅读 · 2020年11月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

196+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《电磁（电子）战：英国能力》最新32页报告

《美军条令：斯特赖克步兵步枪排与班作战条令》最新450页

《美海军分布式海上作战（DMO）概念：最新情况》

《跨时空与跨模态学习事件模式构建体系（LESTAT）》57页DARPA研究报告

相关资讯

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

4+阅读 · 2018年11月6日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Fair Allocation of Conflicting Items

Arxiv

0+阅读 · 2021年4月16日

Achieving differential privacy for $k$-nearest neighbors based outlier detection by data partitioning

Arxiv

0+阅读 · 2021年4月16日

Clan Embeddings into Trees, and Low Treewidth Graphs

Arxiv

0+阅读 · 2021年4月15日

Proof of the satisfiability conjecture for large k

Arxiv

0+阅读 · 2021年4月15日

Peeking Behind the Ordinal Curtain: Improving Distortion via Cardinal Queries

Arxiv

0+阅读 · 2021年4月15日

Stochastic Processes with Expected Stopping Time

Arxiv

0+阅读 · 2021年4月15日

Query-Competitive Sorting with Uncertainty

Arxiv

0+阅读 · 2021年4月14日

Sharp phase transitions for exact support recovery under local differential privacy

Arxiv

0+阅读 · 2021年4月14日

Semantic Neighborhood Ordering in Multi-objective Genetic Programming based on Decomposition

Arxiv

0+阅读 · 2021年4月13日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

微信扫码咨询专知VIP会员