We consider $k$-means clustering in the online no-substitution setting where one must decide whether to take each data point $x_t$ as a center immediately upon streaming it and cannot remove centers once taken. Our work is focused on the \emph{arbitrary-order} assumption where there are no restrictions on how the points $X$ are ordered or generated. Algorithms in this setting are evaluated with respect to their approximation ratio compared to optimal clustering cost, the number of centers they select, and their memory usage. Recently, Bhattacharjee and Moshkovitz (2020) defined a parameter, $Lower_{\alpha, k}(X)$ that governs the minimum number of centers any $\alpha$-approximation clustering algorithm, allowed any amount of memory, must take given input $X$. To complement their result, we give the first algorithm that takes $\tilde{O}(Lower_{\alpha,k}(X))$ centers (hiding factors of $k, \log n$) while simultaneously achieving a constant approximation and using $\tilde{O}(k)$ memory in addition to the memory required to save the centers. Our algorithm shows that it in the no-substitution setting, it is possible to take an order-optimal number of centers while using little additional memory.
翻译:我们考虑在在线非替代设置中以美元为单位分组 。 在这样的设置中, 人们必须决定是否将每个数据点 $x_t$ 立即作为数据点在流流中立即作为中心, 并且一旦删除中心, 我们的工作重点是 emph{ a 任意命令} 假设对于如何订购或生成点没有限制 $X$ 没有限制 。 本设置中的算法是根据其近似比率与最佳组合成本、 他们选择的中心数量和记忆用量来评估的。 最近, Bhattacharjee 和 Moshkovitz (202020) 定义了一个参数, $Lower ⁇ alpha, k} (X) 来调节最小的中心数量 $\ alpha$- approcolm 组合算法, 允许任何记忆量, 必须给输入 $X$。 为了补充其结果, 我们给出了第一个算法, 以 $\ tilde{O} (Lewer ⁇ alpha, k} 和它们的记忆用量( X) ) 中心( ) (确定 $+ n$) 额外的因数) 来同时实现一个恒近似的近似值, 并使用我们的记忆中心。