结合最佳中心复杂度和低内存 (No-Substitution $k$-means Clustering with Optimal Center Complexity and Low Memory) - 专知论文

会员服务 ·

0

簇 · 优化器 · 情景 · 近似 · ONCE ·

2021 年 11 月 8 日

No-Substitution $k$-means Clustering with Optimal Center Complexity and Low Memory

翻译：结合最佳中心复杂度和低内存

Robi Bhattacharjee,Jacob Imola

We consider $k$-means clustering in the online no-substitution setting where one must decide whether to take each data point $x_t$ as a center immediately upon streaming it and cannot remove centers once taken. Our work is focused on the \emph{arbitrary-order} assumption where there are no restrictions on how the points $X$ are ordered or generated. Algorithms in this setting are evaluated with respect to their approximation ratio compared to optimal clustering cost, the number of centers they select, and their memory usage. Recently, Bhattacharjee and Moshkovitz (2020) defined a parameter, $Lower_{\alpha, k}(X)$ that governs the minimum number of centers any $\alpha$-approximation clustering algorithm, allowed any amount of memory, must take given input $X$. To complement their result, we give the first algorithm that takes $\tilde{O}(Lower_{\alpha,k}(X))$ centers (hiding factors of $k, \log n$) while simultaneously achieving a constant approximation and using $\tilde{O}(k)$ memory in addition to the memory required to save the centers. Our algorithm shows that it in the no-substitution setting, it is possible to take an order-optimal number of centers while using little additional memory.

翻译：我们考虑在在线非替代设置中以美元为单位分组。在这样的设置中, 人们必须决定是否将每个数据点 $x_t$ 立即作为数据点在流流中立即作为中心, 并且一旦删除中心, 我们的工作重点是 emph{ a 任意命令} 假设对于如何订购或生成点没有限制 $X$ 没有限制。本设置中的算法是根据其近似比率与最佳组合成本、他们选择的中心数量和记忆用量来评估的。最近, Bhattacharjee 和 Moshkovitz (202020) 定义了一个参数, $Lower ⁇ alpha, k} (X) 来调节最小的中心数量 $\ alpha$- approcolm 组合算法, 允许任何记忆量, 必须给输入 $X$。为了补充其结果, 我们给出了第一个算法, 以 $\ tilde{O} (Lewer ⁇ alpha, k} 和它们的记忆用量( X) ) 中心( ) (确定 $+ n$) 额外的因数) 来同时实现一个恒近似的近似值, 并使用我们的记忆中心。

0

相关内容

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

75+阅读 · 2020年5月5日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

专知会员服务

11+阅读 · 2020年1月17日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

196+阅读 · 2019年12月19日

【电子书|交互式线性代数】《Interactive Linear Algebra》by Dan Margalit, Joseph Rabinoff（附455页pdf）

【电子书|交互式线性代数】《Interactive Linear Algebra》by Dan Margalit, Joseph Rabinoff（附455页pdf）

专知会员服务

69+阅读 · 2019年11月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【Github】TextCluster：短文本聚类预处理模块 Short text cluster

【Github】TextCluster：短文本聚类预处理模块 Short text cluster

AINLP

5+阅读 · 2019年12月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

保序最优传输：Order-preserving Optimal Transport

保序最优传输：Order-preserving Optimal Transport

我爱读PAMI

6+阅读 · 2018年9月16日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

A Characterization of Approximability for Biased CSPs

Arxiv

0+阅读 · 2022年1月12日

Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic

Arxiv

0+阅读 · 2022年1月11日

Multi-Hypothesis Scan Matching through Clustering

Arxiv

0+阅读 · 2022年1月11日

Learning polytopes with fixed facet directions

Learning polytopes with fixed facet directions

Arxiv

0+阅读 · 2022年1月10日

A hybrid estimation of distribution algorithm for joint stratification and sample allocation

Arxiv

0+阅读 · 2022年1月9日

Complexity of Source-Sink Monotone 2-Parameter Min Cut

Arxiv

0+阅读 · 2022年1月6日

Dynamic Suffix Array with Polylogarithmic Queries and Updates

Arxiv

0+阅读 · 2022年1月4日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

相关VIP内容

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

75+阅读 · 2020年5月5日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

【AAAI2020】拓扑贝叶斯优化与持久性图：Topological Bayesian Optimization with Persistence Diagrams

专知会员服务

11+阅读 · 2020年1月17日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

196+阅读 · 2019年12月19日

【电子书|交互式线性代数】《Interactive Linear Algebra》by Dan Margalit, Joseph Rabinoff（附455页pdf）

【电子书|交互式线性代数】《Interactive Linear Algebra》by Dan Margalit, Joseph Rabinoff（附455页pdf）

专知会员服务

69+阅读 · 2019年11月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】基于物理的模拟

流匹配在生物学与生命科学中的应用综述

高质量数据集实践指南（1.0）

ICML 2025 关于语言模型机械可解释性的教程

相关资讯

【Github】TextCluster：短文本聚类预处理模块 Short text cluster

【Github】TextCluster：短文本聚类预处理模块 Short text cluster

AINLP

5+阅读 · 2019年12月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

保序最优传输：Order-preserving Optimal Transport

保序最优传输：Order-preserving Optimal Transport

我爱读PAMI

6+阅读 · 2018年9月16日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

A Characterization of Approximability for Biased CSPs

Arxiv

0+阅读 · 2022年1月12日

Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic

Arxiv

0+阅读 · 2022年1月11日

Multi-Hypothesis Scan Matching through Clustering

Arxiv

0+阅读 · 2022年1月11日

Learning polytopes with fixed facet directions

Learning polytopes with fixed facet directions

Arxiv

0+阅读 · 2022年1月10日

A hybrid estimation of distribution algorithm for joint stratification and sample allocation

Arxiv

0+阅读 · 2022年1月9日

Complexity of Source-Sink Monotone 2-Parameter Min Cut

Arxiv

0+阅读 · 2022年1月6日

Dynamic Suffix Array with Polylogarithmic Queries and Updates

Arxiv

0+阅读 · 2022年1月4日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员